Preserving differential privacy in convolutional deep belief networks
 1.6k Downloads
 1 Citations
Abstract
The remarkable development of deep learning in medicine and healthcare domain presents obvious privacy issues, when deep neural networks are built on users’ personal and highly sensitive data, e.g., clinical records, user profiles, biomedical images, etc. However, only a few scientific studies on preserving privacy in deep learning have been conducted. In this paper, we focus on developing a private convolutional deep belief network (pCDBN), which essentially is a convolutional deep belief network (CDBN) under differential privacy. Our main idea of enforcing \(\epsilon \)differential privacy is to leverage the functional mechanism to perturb the energybased objective functions of traditional CDBNs, rather than their results. One key contribution of this work is that we propose the use of Chebyshev expansion to derive the approximate polynomial representation of objective functions. Our theoretical analysis shows that we can further derive the sensitivity and error bounds of the approximate polynomial representation. As a result, preserving differential privacy in CDBNs is feasible. We applied our model in a health social network, i.e., YesiWell data, and in a handwriting digit dataset, i.e., MNIST data, for human behavior prediction, human behavior classification, and handwriting digit recognition tasks. Theoretical analysis and rigorous experimental evaluations show that the pCDBN is highly effective. It significantly outperforms existing solutions.
Keywords
Deep learning Differential privacy Human behavior prediction Health informatics Image classification1 Introduction
Today, amid rapid adoption of electronic health records and wearables, the global health care systems are systematically collecting longitudinal patient health information, e.g., diagnoses, medication, lab tests, procedures, demography, clinical notes, etc. The patient health information is generated by one or more encounters in any healthcare delivery systems (Jamoom et al. 2016). Healthcare data is now measured in exabytes, and it will reach the zettabyte and the yottabyte range in the near future (Fang et al. 2016). Although appropriate in a variety of situations, many traditional methods of analysis do not automatically capture complex and hidden features from largescale and perhaps unlabeled data (Miotto et al. 2016). In practice, many health applications depend on including domain knowledge to construct relevant features, some of which are further based on supplemental data. This process is not straightforward and time consuming. That may result in missing opportunities to discover novel patterns and features.
This is where deep learning, which is one of the stateoftheart machine learning techniques, comes in to take advantage of the potential that largescale healthcare data holds, especially in the age of digital health. Deep neural networks can discover novel patterns and dependencies in both unlabeled and labeled data by applying stateoftheart training algorithms, e.g., greedylayer wise (Hinton et al. 2006), contrastive divergent algorithm (Hinton 2002), etc. That makes it easier to extract useful information when building classifiers and predictors (LeCun et al. 2015).
Deep learning has applications in a number of healthcare areas, e.g., phenotype extraction and health risk prediction (Cheng et al. 2016), prediction of the development of various diseases including schizophrenia, a variety of cancers, diabetes, heart failure, etc. (Choi et al. 2016; Li et al. 2015; Miotto et al. 2016; Roumia and Steinhubl 2014; Wu et al. 2010), prediction of risk of readmission (Wu et al. 2010), Alzheimer’s diagnosis (Liu et al. 2014; Ortiz et al. 2016), risk prediction for chronic kidney disease progression (Perotte et al. 2015), physical activity prediction (Phan et al. 2015a, b, 2016a, c), feature learning from fMRI data (Plis et al. 2014), diagnosis code assignment (Gottlieb et al. 2013; Perotte et al. 2014), reconstruction of brain circuits (Helmstaedter et al. 2013), prediction of the activity of potential drug molecules (Ma et al. 2015), the effects of mutations in noncoding DNA on gene expressions (Leung et al. 2014; Xiong et al. 2015), and many more.
The development of deep learning in the domain of medicine and healthcare presents obvious privacy issues, when deep neural networks are built based on patients’ personal and highly sensitive data, e.g., clinical records, user profiles, biomedical images, etc. To convince individuals to allow that their data be included in deep learning projects, principled and rigorous privacy guarantees must be provided. However, only a few deep learning techniques have yet been developed that incorporate privacy protections. In clinical trials, such lack of protection and efficacy may put patient data at high risk and expose healthcare providers to legal action based on HIPAA/HITECH law (U.S. Department of Health and Human Services 2016a, b). Motivated by this, we aim to develop an algorithm to preserve privacy in fundamental deep learning models in this paper.
Releasing sensitive results of statistical analyses and data mining while protecting privacy has been studied in the past few decades. One stateoftheart privacy model is \(\epsilon \)differential privacy (Dwork et al. 2006). A differential privacy model ensures that the adversary cannot infer any information about any particular data record with high confidence (controlled by a privacy budget \(\epsilon \)) from the released learning models. This strong standard for privacy guarantees is still valid, even if the adversary possesses all the remaining tuples of the sensitive data. The privacy budget \(\epsilon \) controls the amount by which the output distributions induced by two neighboring databases may differ. We say that two databases are neighboring if they differ in a single data record, that is, if one data record is present in one database and absent in the other. It is clear that the smaller values of \(\epsilon \) enforce a stronger privacy guarantee. This is because it is more difficult to infer any particular data record by distinguishing any two neighboring databases from the output distributions. Differential privacy research has been studied from the theoretical perspective, e.g., Chaudhuri and Monteleoni (2008a), Hay et al. (2010), Kifer and Machanavajjhala (2011) and Lee and Clifton (2012). Different types of mechanisms [e.g., the Laplace mechanism (Dwork et al. 2006), the smooth sensitivity (Nissim et al. 2007), the exponential mechanism (McSherry and Talwar 2007a), and the perturbation of objective function (Chaudhuri and Monteleoni 2008a)] have been studied to enforce differential privacy.
Combining differential privacy and deep learning, i.e., the two stateoftheart techniques in privacy preserving and machine learning, is timely and crucial. This is a nontrivial task, and therefore only a few scientific studies have been conducted. In Shokri and Shmatikov (2015), the authors proposed a distributed training method, which directly injects noise into gradient descents of parameters, to preserve privacy in neural networks. The method is attractive for applications of deep learning on mobile devices. However, it may consume an unnecessarily large portion of the privacy budget to ensure model accuracy, as the number of training epochs and the number of shared parameters among multiple parties are often large. To improve this, based on the composition theorem (Dwork and Lei 2009; Abadi et al. 2016) proposed a privacy accountant, which keeps track of privacy spending and enforces applicable privacy policies. The approach is still dependent on the number of training epochs. With a small privacy budget \(\epsilon \), only a small number of epochs can be used to train the model. In practice, that could potentially affect the model utility, when the number of training epochs needs to be large to guarantee the model accuracy.
Recently, Phan et al. (2016c) proposed deep private autoencoders (dPAs), in which differential privacy is enforced by perturbing the objective functions of deep autoencoders (Bengio 2009). It is worthy to note that the privacy budget consumed by dPAs is independent of the number of training epochs. A different method, named CryptoNets, was proposed in Dowlin et al. (2016) towards the application of neural networks to encrypted data. A data owner can send their encrypted data to a cloud service that hosts the network, and get encrypted predictions in return. This method is different from our context, since it does not aim at releasing learning models under privacy protections.
Existing differential privacy preserving algorithms in deep learning pose major concerns about their applicability. They are either designed for a specific deep learning model, i.e., deep autoencoders (Phan et al. 2016c), or they are affected by the number of training epochs (Shokri and Shmatikov 2015; Abadi et al. 2016). Therefore, there is an urgent demand for the development of a privacy preserving framework, such that: (1) It is totally independent of the number of training epochs in consuming privacy budget; and (2) It has the potential to be applied in typical energybased deep neural networks. Such frameworks will significantly promote the application of privacy preservation in deep learning.
Motivated by this, we aim at developing a private convolutional deep belief network (pCDBN), which essentially is a convolutional deep belief network (CDBN) (Lee et al. 2009) under differential privacy. CDBN is a typical and wellknown deep learning model. It is an energybased model. Preserving differential privacy in CDBNs is nontrivial, since CDBNs are more complicated compared with other fundamental models, such as autoencoders and Restricted Boltzmann Machines (RBM) (Smolensky 1986), in terms of structural designs and learning algorithms. In fact, there are multiple groups of hidden units in each of which parameters are shared in a CDBN. Inappropriate analysis might result in consuming too much of a privacy budget in training phases. The privacy consumption also must be independent of the number of training epochs to guarantee the potential to work with large datasets.
Our key idea is to apply Chebyshev Expansion (Rivlin 1990) to derive polynomial approximations of nonlinear objective functions used in CDBNs, such that the design of differential privacypreserving deep learning is feasible. Then, we inject noise into these polynomial forms, so that the \(\epsilon \)differential privacy is satisfied in the training phases of each hidden layer by leveraging functional mechanism (Zhang et al. 2012). Third, hidden layers now become private hidden layers, which can be stacked on each other to produce a private convolutional deep belief network (pCDBN).
To demonstrate the effectiveness of our framework, we applied our model for binomial human behavior prediction and classification tasks in a health social network. A novel human behavior model based on the pCDBN is proposed to predict whether an overweight or obese individual will increase physical exercise in a real health social network. To illustrate the ability to work with largescale datasets of our model, we also conducted additional experiments on the wellknown handwriting digit dataset (MNIST data) (Lecun et al. 1998). We compare our model with the private stochastic gradient descent algorithm, denoted pSGD, from Abadi et al. (2016), and the deep private autoencoders (dPAs) (Phan et al. 2016c). The pSGD and dPAs are the stateoftheart algorithms in preserving differential privacy in deep learning. Theoretical analysis and rigorous experimental evaluations show that our model is highly effective. It significantly outperforms existing solutions.
The rest of the paper is organized as follows. In Sect. 2, we introduce preliminaries and related works. We present our private convolutional deep belief network in Sect. 3. The experimental evaluation is in Sect. 4, and we conclude the paper in Sect. 5.
2 Preliminaries and related works
In this section, we briefly revisit the definition of differential privacy, functional mechanism (Zhang et al. 2012), convolutional deep belief networks (Lee et al. 2009), and the Chebyshev Expansion (Rivlin 1990). Let D be a database that contains n tuples \(t_1, t_2, \ldots , t_n\) and d+1 attributes \(X_1, X_2, \ldots , X_d, Y\). For each tuple \(t_i = (x_{i1}, x_{i2}, \ldots , x_{id}, y_i)\), we assume, without loss of generality, \(\sqrt{\sum ^d_{j=1} x^2_{ij}} \le 1\) where \(x_{ij} \ge 0\), \(y_i\) follows a binomial distribution. Our objective is to construct a deep neural network \(\rho \) from D that (i) takes \(\mathbf {x}_i = (x_{i1}, x_{i2}, \ldots , x_{id})\) as input and (ii) outputs a prediction of \(y_i\) that is as accurate as possible. \(t_i\) and \(\mathbf {x}_i\) are used exchangeably to indicate the data tuple i. The model function \(\rho \) contains a model parameter vector W. To evaluate whether W leads to an accurate model, a cost function \(f_D(W)\) is often used to measure the difference between the original and predicted values of \(y_i\). As the released model parameter W may disclose sensitive information of D, to protect the privacy, we require that the model training should be performed with an algorithm that satisfies \(\epsilon \) differential privacy.
Differential privacy (Dwork et al. 2006) establishes a strong standard for privacy guarantees for algorithms, e.g., training algorithms of machine learning models, on aggregate databases. It is defined in the context of neighboring databases. We say that two databases are neighboring if they differ in a single data record. That is, if one data record is present in one database and absent in the other. The definition of differential privacy is as follows:
Definition 1
Research in differential privacy has been significantly studied, from both the theoretical perspective, e.g., Chaudhuri and Monteleoni (2008b), Kifer and Machanavajjhala (2011), and the application perspective, e.g., data collection (Erlingsson et al. 2014), data streams (Chan et al. 2012), stochastic gradient descents (Song et al. 2013), recommendation (McSherry and Mironov 2009), regression (Chaudhuri and Monteleoni 2008b), online learning (Jain et al. 2012), publishing contingency tables (Xiao et al. 2010), and spectral graph analysis (Wang et al. 2013). The mechanisms of achieving differential privacy mainly include the classic approach of adding Laplacian noise (Dwork et al. 2006), the exponential mechanism (McSherry and Talwar 2007b), and the functional perturbation approach (Chaudhuri and Monteleoni 2008b).
2.1 Functional mechanism revisited
Functional mechanism (Zhang et al. 2012) is an extension of the Laplace mechanism. It achieves \(\epsilon \)differential privacy by perturbing the objective function \(f_D(W)\) and then releasing the model parameter \(\overline{W}\) that minimizes the perturbed objective function \(\overline{f}_D(W)\) instead of the original one. The functional mechanism exploits the polynomial representation of \(f_D(W)\). The model parameter W is a vector that contains d values \(W_1, \ldots , W_d\). Let \(\phi (W)\) denote a product of \(W_1, \ldots , W_d\), namely, \(\phi (W) = W^{c_1}_1 \cdot W^{c_2}_2 \cdot \cdot \cdot W^{c_d}_d\) for some \(c_1, \ldots , c_d \in \mathbb {N}\). Let \({\varPhi }_j (j \in \mathbb {N})\) denote the set of all products of \(W_1, \ldots , W_d\) with degree j, i.e., \({\varPhi }_j = \big \{W^{c_1}_1 \cdot W^{c_2}_2 \cdot \cdot \cdot W^{c_d}_d \Big \vert \sum _{l = 1}^d c_l = j \big \}\). By the StoneWeierstrass Theorem, any continuous and differentiable \(f(t_i, W)\) can always be written as a polynomial of \(W_1, \ldots , W_d\), for some \(J \in [0, \infty ]\), i.e., \(f(t_i, W) = \sum _{j = 0}^J\sum _{\phi \in {\varPhi }_j}\lambda _{\phi t_i}\phi (W)\) where \(\lambda _{\phi t_i} \in \mathbb {R}\) denotes the coefficient of \(\phi (W)\) in the polynomial. Note that \(t_i\) and \(\mathbf {x}_i\) are used exchangeably to indicate the data tuple i.
Lemma 1
To achieve \(\epsilon \)differential privacy, \(f_D(W)\) is perturbed by injecting Laplace noise \(Lap(\frac{{\varDelta }}{\epsilon })\) into its polynomial coefficients \(\lambda _{\phi }\), and then the model parameter \(\overline{W}\) is derived to minimize the perturbed function \(\overline{f}_D(W)\), where \({\varDelta }= 2 \max _t \sum _{j = 1}^J \sum _{\phi \in {\varPhi }_j} \lambda _{\phi t}_1\), according to the Lemma 1.
2.2 Convolutional deep belief networks
We can use the layerwise unsupervised training algorithm (Bengio et al. 2007) and backpropagation to train CDBNs.
2.3 Chebyshev polynomials
In principle, many polynomial approximation techniques, e.g., Taylor Expansion, Bernoulli polynomial, Euler polynomial, Fourier series, Discrete Fourier transform, Legendre polynomial, Hermite polynomial, Gegenbauer polynomial, Laguerre polynomial, Jacobi polynomial, and even the stateoftheart techniques in the twentieth century, including spectral methods and Finite Element methods (Harper 2012), can be applied to approximate nonlinear energy functions used in CDBNs. However, figuring out an appropriate way to use each of them is nontrivial. First, estimating the lower and upper bounds of the approximation error incurred by applying a particular polynomial in deep neural networks is not straightforward; it is very challenging. It is significant to have a strong guarantee in terms of approximation errors incurred by the use of any approximation approach to ensure model utility in deep neural networks. In addition, the approximation error bounds must be independent of the number of data instances to guarantee the ability to be applied in large datasets without consuming excessive privacy budgets.
With these challenging issues, Chebyshev polynomial really stands out. The most important reason behind the usage of Chebyshev polynomial is that the upper and lower bounds of the error incurred by approximating activation functions and energy functions can be estimated and proved, as shown in the next section. Furthermore, these error bounds do not depend on the number of data instances, as we will present in Sect. 3.4. This is a substantial result when working with complex models, such as deep neural networks on largescale datasets. In addition, Chebyshev polynomials are wellknown, efficient, and widely used in many realworld applications (Mason and Handscomb 2002). Therefore, we propose to use Chebyshev polynomials in our work to preserve differential privacy in deep convolution belief networks.
3 Private convolutional deep belief network
In this section, we formally present our framework (Algorithm 1) to develop a convolutional deep belief network under \(\epsilon \)differential privacy. Intuitively, the algorithm used to develop dPAs can be applied to CDBNs. However, the main issue is that their approximation technique has been especially designed for crossentropy errorbased objective functions (Bengio 2009). There are many challenging issues in adapting their technique in CDBNs. The cross entropy errorbased objective function is very different from the energybased objective function (Eq. 7). As such: (1) It is difficult to derive its global sensitivity used in the functional mechanism, and (2) It is difficult to identify the approximation error bounds in CDBNs. To achieve private convolutional deep belief networks (pCDBNs), we figure out a new approach of using the Chebyshev Expansion (Rivlin 1990) to derive polynomial approximations of nonlinear energybased objective functions (Eq. 7), such that differential privacy can be preserved by leveraging the functional mechanism.

First, we derive a polynomial approximation of energybased function E(D, W) (Eq. 7), using the Chebyshev Expansion. The polynomial approximation is denoted as \(\widehat{E}(D, W)\).

Second, the functional mechanism is used to perturb the approximation function \(\widehat{E}(D, W)\); the perturbed function is denoted as \(\overline{E}(D, W)\). We introduce a new result of sensitivity computation for CDBNs. Next, we train the model to obtain the optimal perturbed parameters \(\overline{W}\) by using gradient descent. That results in private hidden layers, which are used to produce maxpooling layers. Note that we do not need to enforce differential privacy in maxpooling layers. This is because maxpooling layers play roles as signal filters only.

Third, we stack multiple pairs of a private hidden layer and a maxpooling layer (H, P) on top of each other to construct the private convolutional deep belief network (pCDBN).

Finally, we apply the technique presented in Phan et al. (2016c) to enforce differential privacy in the softmax layer for prediction and classification tasks.
3.1 Polynomial approximation of the energy function
There are two challenges in the energy function E(D, W) that prevent us from applying it for private data reconstruction analysis: (1) Gibbs sampling is used to estimate the value of every \(h^k_{ij}\); and (2) The probability of every \(h^k_{ij}\) equal to 1 is a sigmoid function which is not a polynomial function with parameters \(W^k\). Therefore, it is difficult to derive the sensitivity and error bounds of the approximation polynomial representation of the energy function E(D, W). Perturbing Gibbs sampling is challenging. Meanwhile, injecting noise in the results of Gibbs sampling will significantly affect the properties of hidden variables, i.e., values of hidden variables might be out of their original bounds, i.e., [0, 1].
3.2 Perturbation of objective functions
We employ the functional mechanism (Zhang et al. 2012) to perturb the objective function \(\widehat{E}(\cdot )\) by injecting Laplace noise into its polynomial coefficients. The hidden layer contains K groups of hidden units. Each group is trained with a local region of input neurons, which will not be merged with each other in the learning process. Therefore, it is not necessary to aggregate sensitivities of the training algorithm in K groups to the sensitivity of the function \(\widehat{E}(\cdot )\). Instead, the sensitivity of the function \(\widehat{E}(\cdot )\) can be considered the maximal sensitivity given any single group. As a result, the sensitivity of the function \(\widehat{E}(\cdot )\) can be computed in the following lemma.
Lemma 2
Proof
We use gradient descent to train the perturbed model \(\overline{E}(\cdot )\). That results in private hidden layers. To construct a private convolutional deep belief network (pCDBN), we stack multiple private hidden layers and maxpooling layers on top of each other. The pooling layers only play the roles of signal filters of the private hidden layers. Therefore, there is no need to enforce privacy in maxpooling layers.
3.3 Perturbation of softmax layer
3.4 Approximation error bounds
The following lemma shows how much error our approximation approaches incur. The average error of the approximations is always bounded, as presented in the following lemma:
Lemma 3
Proof
The approximation error depends on the structure of the energy function E(D, W), i.e., the number of hidden neurons \(N^2_H K\) and \(\vert A_{L + 1}\vert \), and the number of attributes of the dataset. Lemma 3 can be used to determine when it should stop learning the approximation model. For each group of \(N_H^2\) hidden units, the upper bound of the sum square error is only \(\frac{\pi }{4}N_H^2 \vert A_{L + 1}\vert \), i.e., \(\vert A_{L + 1}\vert \) is tiny when L is large enough.
Importantly, Lemmas 2 and 3 show that the sensitivity \({\varDelta }\) and the approximation error bounds of the energybased function are entirely independent of the number of data instances. This sufficiently guarantees that our differential privacy preserving framework can be applied in large datasets without consuming excessive privacy budgets. This is a substantial result when working with complex models, such as deep neural networks on largescale datasets. It is worth noting that nonlinear activation functions, which are continuously differentiable [StoneWeierstrass Theorem (Rudin 1976)] and satisfy the Riemannintegrable condition, can be approximated by using Chebyshev Expansion. Therefore, our framework can be applied given such activation functions as, e.g., tanh, arctan, sigmoid, softsign, sinusoid, sinc, Gaussian, etc. (Wikipedia 2016). In the experiment section, we will show that our approach leads to accurate results.
Note that the proofs of Lemmas 2 and 3 do not depend on the assumption of the data features being nonnegative, and that the target follows by a binomial distribution. The proofs are generally applicable for inputs and the target, which are not restricted by any constraint. As shown in the next section, our approach efficiently works with a multiclass classification task on the MNIST dataset (Lecun et al. 1998). The crossentropy error function is applied in the softmax layer.
4 Experiments
To validate our approach, we have conducted an extensive experiment on wellknown and largescale datasets, including a health social network, YesiWell data (Phan et al. 2016c), and a handwriting digit dataset, MNIST (Lecun et al. 1998). Our task of validation focuses on four key issues: (1) The effectiveness and robustness of our pCDBN model; (2) The effects of our model and hyperparameter selections, including the use of Chebyshev polynomial, the impact of the polynomial degree L, and the effect of probabilities \(P(h^k_{ij}=1v)\) in approximating the energy function; (3) The ability to work on largescale datasets of our model; and (4) The benefits of being independent of the number of training epochs in consuming privacy budget.
We carry out the validation through three approaches. One is by conducting the human behavior prediction with various settings of data cardinality, privacy budget \(\epsilon \), noisy vs. noiseless models, and original versus approximated models. By this we rigorously examine the effectiveness of our model compared with the stateoftheart algorithms, i.e., Phan et al. (2016c) and Abadi et al. (2016). The second approach is to discover gold standards in our model configuration by examining various settings of hyperparameters. The third approach is to access the benefits of being independent of the number of training epochs in terms of consuming privacy budget of our pCDBN model. In fact, we present the prediction accuracies of our pCDBN and existing algorithms as a function of the number of training epochs.
4.1 Human behavior modeling
In this experiment, we have developed a private convolutional deep belief network (pCDBN) for human behavior prediction and classification tasks in the YesiWell health social network (Phan et al. 2016c).
Health social network data To be able to compare our model with the stateoftheart deep private autoencoders for human behavior prediction (dPAH), we use the same dataset used in Phan et al. (2016c). Data were collected from Oct 2010 to Aug 2011 as a collaboration between PeaceHealth Laboratories, SK Telecom Americas, and the University of Oregon to record daily physical activities, social activities (text messages, competitions, etc.), biomarkers, and biometric measures (cholesterol, BMI, etc.) for a group of 254 overweight and obese individuals. Physical activities, including information about the number of walking and running steps, were reported via a mobile device carried by each user. All users enrolled in an online social network, allowing them to friend and communicate with each other. Users’ biomarkers and biometric measures were recorded via daily/weekly/monthly medical tests performed at home individually or at our laboratories.

Behaviors: #competitions joined, #exercising days, #goals set, #goals achieved, \(\sum \)(distances), avg(speeds);

#Inbox Messages: Encouragement, Fitness, Followup, Competition, Games, Personal, Study protocol, Progress report, Technique, Social network, Meetups, Goal, Wellness meter, Feedback, Heckling, Explanation, Invitation, Notice, Technical fitness, Physical;

Biomarkers and Biometric Measures: Wellness Score, BMI, BMI slope, Wellness Score slope.
The model includes two hidden layers. We trained 10 first layer bases, each \(4 \times 12\) variables v, and 10 second layer bases, each \(2 \times 6\). The pooling ratio was 2 for both layers. In our work, contrastive divergent algorithm (Hinton 2002) was used to optimize the energy function, and backpropagation was used to optimize the crossentropy error function in the softmax layer. The implementations of our models using Tensorflow^{1} and Python were made publicly available on GitHub.^{2} The results and algorithms can be reproduced on either a single workstation or a Hadoop cluster. To examine the effectiveness of our pCDBN, we established two experiments, i.e., prediction and classification, as follows.
4.1.1 Human behavior prediction
The number of previous time intervals N is set to 4. N is used as a time window to generate training samples. For instance, given 10 days of data (\(\mathcal {M} = 10\)), a time window of 4 days \(N = 4\), and d data features, e.g., BMI, #steps, etc., a single input V will be a \(d \times N (= d \times 4)\) matrix. A single input V is considered as a data sample to model human behavior in our prediction model. If we move the window N on 10 days of data, i.e., \(\mathcal {M}\), we will have \(\mathcal {M}  N + 1\) training samples for each individual, i.e., \(10  4 + 1 = 7\) in this example. So, we have, in total, \(254 (\mathcal {M}\) N + 1) = \(254 \times 7 = 1,778\) training samples for every 10 days of data \(\mathcal {M}\) to predict whether an individual will increase physical activity in the next day \(t + 1\).
The Chebyshev polynomial approximation degree L and learning rates are set to 7 and \(10^{3}\). To avoid overfitting, we apply the L1regularization and the dropout technique (Srivastava et al. 2014), i.e., the dropout probability is set to 0.5. Regarding \(\mathcal {K}\)fold crossvalidation or bootstrapping, it is either unnecessary or impractical to apply them in deep learning, and particularly in our study (Bengio 2017; Reed et al. 2014). This is because: (1) It is too expensive and time consuming to train \(\mathcal {K}\) deep neural networks, each of which usually has a large number of parameters, e.g., hundreds of thousands of parameters (Bengio 2017); and (2) Bootstrapping is only used to train neural networks when class labels may be missing, objects in the image may not be localized, and in general, the labeling may be subjective, noisy, and incomplete (Reed et al. 2014). This is out of the scope of our focus. Our models were trained on a graphic card NVIDIA GTX TITAN X, 12 GB RAM with 3072 CUDA cores.
 (a)
Deep learning models for human behavior prediction, including: (1) The original convolutional deep neural network (CDBN) for human behavior prediction without enforcing differential privacy; (2) The truncated version of the CDBNs, in which the energy function is approximated without injecting noise to preserve differential privacy, denoted TCDBN; and (3) The conditional Restricted Boltzmann Machine, denoted SctRBM (Li et al. 2014). None of these models enforces \(\epsilon \)differential privacy.
 (b)
Deep Private AutoEncoder (dPAH) (Phan et al. 2016c), which is the stateoftheart deep learning model under differential privacy for human behavior prediction. The dPAH model outperforms general methods for regression analysis under \(\epsilon \)differential privacy, i.e., functional mechanism (Zhang et al. 2012), DPME (Lei 2011), and filterpriority (Cormode 2011). Therefore, we only compare our model with the dPAH.
4.1.2 Human behavior classification
In this experiment, we aim to examine: (1) The robustness of our approach when it is trained with a large number of epochs at different noise levels; and (2) The effectiveness of different approximation approaches, including Chebyshev, Taylor, and Piecewise approximations. Our experiment setting is as follows:
We consider every pair (u, t) is a data point. Given t is a week, we have, in total, 9652 data points (254 users \(\times \) 38 weeks). We randomly select 10% data points as a testing set, and the remaining data points are used as a training set. At each training step, the model is trained with 111 randomly selected data points, i.e., batch size \(= 111\). To avoid the imbalance in the data, each training batch consists of a balanced number of data samples from different data classes. With this technique, data points in the underrepresented class can be incidentally sampled more than the others (Brownlee 2015). The model is used to classify the statuses of all the users given their features. In this experiment, we compare our model with stateoftheart polynomial approximation approaches in digital implementations, including truncated Taylor Expansion: \(\sigma (x) = \tanh x \approx x  \frac{x^3}{3} + \frac{2x^5}{15}\) (Lee and Jeng 1998; Vlcek 2012) (pCDBN_TE), and linear piecewise approximation: \(\sigma (x) \approx c_1 x + c_2\) (Armato et al. 2009) (pCDBN_PW). Other baseline models, i.e., dPAH and SctRBM, cannot be directly applied to this task; so, we do not include them in this experiment.
\(\bullet \) Figure 4 shows classification accuracies for different levels of privacy budget \(\epsilon \). Each plot illustrates the evolution of the testing accuracy of each algorithm and its power fit curve as a function of the number of epochs. After 600 epochs, our pCDBN can achieve 88% with \(\epsilon = 0.1\), 92% with \(\epsilon = 2\), and 94% with \(\epsilon = 8\). In addition, our model outperforms baseline approaches, i.e., pCDBN_TE and pCDBN_PW, and the results are statistically significant (p = 4.4293\(e^{07}\), performed by paired t test). One of the important observations we acquire from this result is that: The Chebyshev polynomial approximation is more effective than the competitive approaches in preserving differential privacy in convolutional deep belief networks. One of the reasons is that Chebyshev polynomial approximation incurs fewer errors than the other two approaches (Harper 2012; Vlcek 2012). Similar to Layerwise Relevance Propagation (Bach et al. 2015), the approximation errors will propagate across neural layers. Therefore, the smaller the error, the more accurate the models will be.
Note that our observations (i.e., data points) in the YesiWell data are not strictly independent. Therefore, the simple use of paired t test may not give rigorous conclusions. However, the very small p values under the paired t test can still indicate the significant improvement of our approach over baselines.
4.2 Handwriting digit recognition
To further demonstrate the ability to work on largescale datasets, we conducted additional experiments on the wellknown MNIST dataset (Lecun et al. 1998). The MNIST database of handwritten digits consists of 60,000 training examples, and a test set of 10,000 examples (Lecun et al. 1998). Each example is a 28 \(\times \) 28 size graylevel image. The MNIST dataset is completely balanced, with 6000 images for each category, with 10 categories in total.
We compare our model with the private stochastic gradient descent algorithm, denoted pSGD, from Abadi et al. (2016). The pSGD is the stateoftheart algorithm in preserving differential privacy in deep learning. pSGD is an advanced version of Shokri and Shmatikov (2015); therefore, there is no need to include the work proposed by Shokri and Shmatikov (2015) in our experiments. The two approaches, i.e., our proposed algorithm and the pSGD, are built on the same structure of a convolutional deep belief network. As in prior work (Abadi et al. 2016), two convolution layers, one with 32 features and one with 64 features, and each hidden neuron which connects with a 5 \(\times \) 5 unit patch are applied. On top of the convolution layers, there are a fullyconnected layer with 1024 units, and a softmax of 10 classes (corresponding to the 10 digits) with crossentropy loss.
\(\bullet \) Figure 5b demonstrates the benefit of being independent of the number of training epochs in consuming the privacy budget of our mechanism. In this experiment, \(\epsilon \) is set to 0.5, i.e., large injected noise. The pSGD achieves higher prediction accuracies after using a small number of training epochs, i.e., 88.75% after 18 epochs, compared with the pCDBN. More epochs cannot be used to train the pSGD, since it will violate the privacy protection guarantee. Meanwhile, our model, the pCDBN, can be trained with an unlimited number of epochs. After a certain number of training epochs, i.e., 162 epochs, the pCDBN outperforms the pSGD in terms of prediction accuracy, with 91.71% compared with 88.75%.
Our experimental results clearly show the ability to work with largescale datasets using our mechanism. In addition, it is significant to be independent of the number of training epochs in consuming privacy budget \(\epsilon \). Our mechanism is the first of its kind offering this distinctive ability.
\(\bullet \) The impact of polynomial degree L Figure 6 shows the prediction accuracies of our model by using different values of L on the MNIST dataset (Lecun et al. 1998). After a certain number of training epochs, it is clear that the impact of L is not significant when L is larger than or equal to 3. In fact, the models with \(L \ge 3\) converge to similar prediction accuracies after 162 training epochs. The difference is notable with small numbers of training epochs. With L larger than 7, the prediction accuracies are very much the same. Therefore we did not show them in Fig. 6. Our observation can be used as a gold standard in selecting L when approximating energy functions based on Chebyshev polynomials.
\(\bullet \) Computational performance Given the MNIST dataset, it takes an average of 761 seconds to train our model, after 162 epochs, by using a GPU (NVIDIA GTX TITAN X, 12 GB RAM with 3072 CUDA cores). Meanwhile, training the pSGD is faster than our model, since only a small number of training epochs is needed to train the pSGD. On average, training the pSGD takes 86 s, after 18 training epochs. For the YesiWell dataset, training our pCDBN model takes an average of 2910 s, after 600 epochs, compared with 2141 s of the dPAH model.
5 Conclusions and discussions
In this paper, we propose a novel framework for developing convolutional deep belief networks under differential privacy. Our approach conducts both sensitivity analysis and noise insertion on the energybased objective functions. Distinctive characteristics offered by our model include: (1) It is totally independent of the number of training epochs in consuming privacy budget; (2) It has the potential to be applied in typical energybased deep neural networks; (3) Nonlinear activation functions, which are continuously differentiable [StoneWeierstrass Theorem (Rudin 1976)] and satisfy the Riemannintegrable condition, e.g., tanh, arctan, sigmoid, softsign, sinusoid, sinc, Gaussian, etc. (Wikipedia 2016), can be applied; and (4) It has the ability to work with largescale datasets. With these fundamental abilities, our framework could significantly improve the applicability of differential privacy preservation in deep learning. To illustrate the effectiveness of our framework, we propose a novel model based on our private convolutional deep belief network (pCDBN), for human behavior modeling. Experimental evaluations on a health social network, YesiWell data, and a handwriting digit dataset, MNIST data, validate our theoretical results and the effectiveness of our approach.
In future work, it is worthwhile to study how we might be able to extract private information from deep neural networks. We will also examine potential approaches to preserve differential privacy in more complex deep learning models, such as Long ShortTerm Memory (LSTM) (Hochreiter and Schmidhuber 1997). Another open direction is how to adapt our framework to multiparty computational settings, in which multiple parties can jointly train a deep learning model under differential privacy. Innovative multiparty computational protocols for deep learning under differential privacy must have the ability to work with largescale datasets.
In principle, our mechanism can be applied on rectified linear units (ReLUs) (Glorot et al. 2011) and on parametric rectified linear units (PReLUs) (He et al. 2015). The main difference is that we do not need to approximate the energy function. This is because the energy function is a polynomial function when applying ReLU units. However, we need to add a local response normalization (LRN) layer (Krizhevsky et al. 2012) to bound the values of hidden neurons. This is a common step when dealing with ReLU units. The implementation of this layer and ReLU units under differential privacy is an exciting opportunity for other researchers in future work.
Another challenging problem is identifying the exact risk of reidentification/reconstruction of the data under differential privacy. In Lee and Clifton (2012), the authors proposed differential identifiability to link individual identifiability to \(\epsilon \) differential privacy. However, this is still a nontrivial question. A fancy solution is to design innovative approaches to reconstruct original models from noisy deep neural networks. Then, one could use the original models to infer sensitive information in the training data. However, how to reconstruct the original models from differentially private deep neural networks is an open question. Of course, it is very challenging and will require a significant effort of the whole community to answer.
Footnotes
Notes
Acknowledgements
This work is supported by the NIH Grant R01GM103309 to the SMASH project. Wu is also supported by NSF Grant 1502273 and 1523115. Dou is also supported by NSF Grant 1118050. We thank Xiao Xiao and Rebeca Sacks for their contributions.
References
 Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., & Zhang, L. (2016). Deep learning with differential privacy. arXiv:1607.00133.
 Arfken, G. (1985). Mathematical methods for physicists (3rd ed.). Cambridge: Academic Press.zbMATHGoogle Scholar
 Armato, A., Fanucci, L., Pioggia, G., & Rossi, D. D. (2009). Lowerror approximation of artificial neuron sigmoid function and its derivative. Electronics Letters, 45(21), 1082–1084.CrossRefGoogle Scholar
 Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., & Samek, W. (2015). On pixelwise explanations for nonlinear classifier decisions by layerwise relevance propagation. PLoS ONE, 10(7), e0130,140.CrossRefGoogle Scholar
 Bandura, A. (1989). Human agency in social cognitive theory. The American Psychologist, 44(9), 1175.CrossRefGoogle Scholar
 Bengio, Y. (2009). Learning deep architectures for AI. Foundation and Trends in Machine Learning, 2(1), 1–127. doi: 10.1561/2200000006.CrossRefzbMATHGoogle Scholar
 Bengio, Y. (2017). Is crossvalidation heavily used in deep learning or is it too expensive to be used? Quora. https://wwwquoracom/IscrossvalidationheavilyusedinDeepLearningorisittooexpensivetobeused.Google Scholar
 Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H., Montreal, U. D., & Quebec, M. (2007). Greedy layerwise training of deep networks. In NIPS.Google Scholar
 Brownlee, J. (2015). 8 tactics to combat imbalanced classes in your machine learning dataset. http://machinelearningmastery.com/tacticstocombatimbalancedclassesinyourmachinelearningdataset/.
 Chan, T. H. H., Li, M., Shi, E., & Xu, W. (2012). Differentially private continual monitoring of heavy hitters from distributed streams. In PETS’12 (pp. 140–159).Google Scholar
 Chaudhuri, K., & Monteleoni, C. (2008a). Privacypreserving logistic regression. In NIPS (pp. 289–296).Google Scholar
 Chaudhuri, K., & Monteleoni, C. (2008b). Privacypreserving logistic regression. In NIPS’08 (pp. 289–296).Google Scholar
 Cheng, Y., Wang, F., Zhang, P., & Hu, J. (2016). Risk prediction with electronic health records: A deep learning approach. In SDM’16.Google Scholar
 Choi, E., Schuetz, A., Stewart, W. F., & Sun, J. (2016). Using recurrent neural network models for early detection of heart failure onset. Journal of the American Medical Informatics Association,. doi: 10.1093/jamia/ocw112.Google Scholar
 Cormode, G. (2011). Personal privacy vs population privacy: Learning to attack anonymization. In KDD’11 (pp. 1253–1261).Google Scholar
 Dowlin, N., GiladBachrach, R., Laine, K., Lauter, K., Naehrig, M., & Wernsing, J. (2016). Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In Proceedings of the 33rd international conference on machine learning, PMLR, proceedings of machine learning research (Vol. 48, pp. 201–210).Google Scholar
 Dwork, C., & Lei, J. (2009). Differential privacy and robust statistics. In STOC’09 (pp. 371–380).Google Scholar
 Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. Theory of Cryptography, 3876, 265–284.MathSciNetzbMATHGoogle Scholar
 Erlingsson, U., Pihur, V., & Korolova, A. (2014). Rappor: Randomized aggregatable privacypreserving ordinal response. In CCS’14 (pp. 1054–1067).Google Scholar
 Fang, R., Pouyanfar, S., Yang, Y., Chen, S. C., & Iyengar, S. S. (2016). Computational health informatics in the big data age: A survey. ACM Computing Surveys, 49(1), 12:1–12:36. doi: 10.1145/2932707.CrossRefGoogle Scholar
 Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In Aistats (Vol. 15, p. 275).Google Scholar
 Gottlieb, A., Stein, G. Y., Ruppin, E., Altman, R. B., & Sharan, R. (2013). A method for inferring medical diagnoses from patient similarities. BMC Medicine, 11(1), 194. doi: 10.1186/1741701511194.CrossRefGoogle Scholar
 Harper, T. (2012). A comparative study of function approximators involving neural networks. Thesis, Master of Science, University of Otago. http://hdl.handle.net/10523/2397.
 Hay, M., Rastogi, V., Miklau, G., & Suciu, D. (2010). Boosting the accuracy of differentially private histograms through consistency. Proceedings of the VLDB Endowment, 3(1), 1021–1032.CrossRefGoogle Scholar
 He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification. CoRR abs/1502.01852. http://arxiv.org/abs/1502.01852.
 Helmstaedter, M., Briggman, K. L., Turaga, S. C., Jain, V., Seung, H. S., & Denk, W. (2013). Connectomic reconstruction of the inner plexiform layer in the mouse retina. Nature, 500(7461), 168–174.CrossRefGoogle Scholar
 Hinton, G. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800.CrossRefzbMATHGoogle Scholar
 Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. doi: 10.1162/neco.2006.18.7.1527.MathSciNetCrossRefzbMATHGoogle Scholar
 Hochreiter, S., & Schmidhuber, J. (1997). Long shortterm memory. Neural Computation, 9(8), 1735–1780. doi: 10.1162/neco.1997.9.8.1735.CrossRefGoogle Scholar
 Jain, P., Kothari, P., & Thakurta, A. (2012). Differentially private online learning. In COLT’12 (pp. 24.1–24.34).Google Scholar
 Jamoom, E. W., Yang, N., & Hing, E. (2016). Adoption of certified electronic health record systems and electronic information sharing in physician offices: United states, 2013 and 2014. NCHS Data Brief, 236, 1–8.Google Scholar
 Kifer, D., & Machanavajjhala, A. (2011). No free lunch in data privacy. In SIGMOD’11 (pp. 193–204).Google Scholar
 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).Google Scholar
 LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. doi: 10.1038/nature14539.CrossRefGoogle Scholar
 Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. doi: 10.1109/5.726791.CrossRefGoogle Scholar
 Lee, J., & Clifton, C. (2012). Differential identifiability. In The 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, Beijing, China, 12–16 August 2012 (pp. 1041–1049).Google Scholar
 Lee, T., & Jeng, J. (1998). The chebyshevpolynomialsbased unified model neural networks for function approximation. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 28(6), 925–935.CrossRefGoogle Scholar
 Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2009). Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In ICML’09 (pp. 609–616).Google Scholar
 Lei, J. (2011). Differentially private mestimators. In NIPS (pp. 361–369).Google Scholar
 Leung, M. K. K., Xiong, H. Y., Lee, L. J., & Frey, B. J. (2014). Deep learning of the tissueregulated splicing code. Bioinformatics, 30(12), i121–i129. doi: 10.1093/bioinformatics/btu277.CrossRefGoogle Scholar
 Li, H., Li, X., Ramanathan, M., & Zhang, A. (2015). Prediction and informative risk factor selection of bone diseases. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 12(1), 79–91. doi: 10.1109/TCBB.2014.2330579.CrossRefGoogle Scholar
 Li, X., Du, N., Li, H., Li, K., Gao, J., & Zhang, A. (2014). A deep learning approach to link prediction in dynamic networks. In SIAM’14 (pp. 289–297).Google Scholar
 Liu, S., Liu, S., Cai, W., Pujol, S., Kikinis, R., & Feng, D. (2014). Early diagnosis of Alzheimer’s disease with deep learning. In IEEE 11th international symposium on biomedical imaging, ISBI 2014, Beijing, China (pp. 1015–1018). doi: 10.1109/ISBI.2014.6868045.
 Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E., & Svetnik, V. (2015). Deep neural nets as a method for quantitative structureactivity relationships. Journal of Chemical Information and Modeling, 55(2), 263–274. doi: 10.1021/ci500747n.CrossRefGoogle Scholar
 Mason, J., & Handscomb, D. (2002). Chebyshev polynomials. Boca Raton: CRC Press. https://books.google.com/books?id=8FHf0P3to0UC.
 McSherry, F., & Mironov, I. (2009). Differentially private recommender systems. In KDD’09, ACM.Google Scholar
 McSherry, F., & Talwar, K. (2007a). Mechanism design via differential privacy. In 48th annual IEEE symposium on foundations of computer science (FOCS 2007), 2023 October 2007, Providence, RI, USA, Proceedings (pp. 94–103).Google Scholar
 McSherry, F., & Talwar, K. (2007b). Mechanism design via differential privacy. In FOCS ’07 (pp. 94–103).Google Scholar
 Miotto, R., Li, L., Kidd, B. A., & Dudley, J. T. (2016). Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports, 6, 26094. doi: 10.1038/srep26094.CrossRefGoogle Scholar
 Nissim, K., Raskhodnikova, S., & Smith, A. (2007). Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirtyninth annual ACM symposium on theory of computing (pp. 75–84), ACM.Google Scholar
 Ortiz, A., Munilla, J., Grriz, J. M., & Ramrez, J. (2016). Ensembles of deep learning architectures for the early diagnosis of the alzheimers disease. International Journal of Neural Systems, 26(07), 1650,025. doi: 10.1142/S0129065716500258.CrossRefGoogle Scholar
 Perotte, A., Pivovarov, R., Natarajan, K., Weiskopf, N., Wood, F., & Elhadad, N. (2014). Diagnosis code assignment: models and evaluation metrics. Journal of the American Medical Informatics Association, 21(2), 231–237. doi: 10.1136/amiajnl2013002159.CrossRefGoogle Scholar
 Perotte, A., Ranganath, R., Hirsch, J. S., Blei, D., & Elhadad, N. (2015). Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis. Journal of the American Medical Informatics Association, 22(4), 872–880. doi: 10.1093/jamia/ocv024.CrossRefGoogle Scholar
 Phan, N., Dou, D., Piniewski, B., & Kil, D. (2015a). Social restricted boltzmann machine: Human behavior prediction in health social networks. In ASONAM’15 (pp. 424–431).Google Scholar
 Phan, N., Dou, D., Wang, H., Kil, D., & Piniewski, B. (2015b). Ontologybased deep learning for human behavior prediction in health social networks. In Proceedings of the 6th ACM conference on bioinformatics, computational biology and health informatics (pp. 433–442). doi: 10.1145/2808719.2808764.
 Phan, N., Dou, D., Piniewski, B., & Kil, D. (2016a). A deep learning approach for human behavior prediction with explanations in health social networks: social restricted boltzmann machine (SRBM+). Social Network Analysis and Mining, 6(1), 79:1–79:14. doi: 10.1007/s1327801603790.CrossRefGoogle Scholar
 Phan, N., Ebrahimi, J., Kil, D., Piniewski, B., & Dou, D. (2016b). Topicaware physical activity propagation in a health social network. IEEE Intelligent Systems, 31(1), 5–14.CrossRefGoogle Scholar
 Phan, N., Wang, Y., Wu, X., & Dou, D. (2016c). Differential privacy preservation for deep autoencoders: An application of human behavior prediction. In AAAI’16 (pp. 1309–1316).Google Scholar
 Plis, S. M., Hjelm, D. R., Salakhutdinov, R., Allen, E. A., Bockholt, H. J., Long, J. D., et al. (2014). Deep learning for neuroimaging: A validation study. Frontiers in Neuroscience, 8, 229. doi: 10.3389/fnins.2014.00229.CrossRefGoogle Scholar
 Reed, S. E., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., & Rabinovich, A. (2014). Training deep neural networks on noisy labels with bootstrapping. CoRR abs/1412.6596.Google Scholar
 Rivlin, T. J. (1990). Chebyshev polynomials form approximation theory to algebra and number theory (2nd ed.). New York: Wiley.zbMATHGoogle Scholar
 Roumia, M., & Steinhubl, S. (2014). Improving cardiovascular outcomes using electronic health records. Current Cardiology Reports, 16(2), 451. doi: 10.1007/s1188601304516.CrossRefGoogle Scholar
 Rudin, W. (1976). Principles of mathematical analysis. New York: McGrawHill.zbMATHGoogle Scholar
 Shokri, R., & Shmatikov, V. (2015). Privacypreserving deep learning. In CCS’15 (pp. 1310–1321).Google Scholar
 Smolensky, P. (1986). Information processing in dynamical systems: Foundations of harmony theory. In Parallel distributed processing: Explorations in the microstructure of cognition (Vol. 1, pp. 194–281).Google Scholar
 Song, S., Chaudhuri, K., & Sarwate, A. D. (2013). Stochastic gradient descent with differentially private updates. In GlobalSIP (pp. 245–248).Google Scholar
 Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958. http://jmlr.org/papers/v15/srivastava14a.html.
 U.S. Department of Health and Human Services. (2016a). Health information technology for economic and clinical health (hitech) act. https://www.hhs.gov/hipaa/forprofessionals/specialtopics/HITECHactenforcementinterimfinalrule/.
 U.S. Department of Health and Human Services. (2016b). Health insurance portability and accountability act of 1996. http://www.hhs.gov/hipaa/.
 Vlcek, M. (2012). Chebyshev polynomial approximation for activation sigmoid function. Neural Network World, 4, 387–393.CrossRefGoogle Scholar
 Wang, Y., Wu, X., & Wu, L. (2013). Differential privacy preserving spectral graph analysis. In PAKDD (2) (pp. 329–340).Google Scholar
 Wikipedia. (2016). Activation function. https://en.wikipedia.org/wiki/Activation_function.
 Wu, J., Roy, J., & Stewart, W. F. (2010). Prediction modeling using EHR data: Challenges, strategies, and a comparison of machine learning approaches. Medical Care, 48(6 Suppl), S106–S113. doi: 10.1097/mlr.0b013e3181de9e17.CrossRefGoogle Scholar
 Xiao, X., Wang, G., & Gehrke, J. (2010). Differential privacy via wavelet transforms. In ICDE’10 (pp. 225–236).Google Scholar
 Xiong, H. Y., Alipanahi, B., Lee, L. J., Bretschneider, H., Merico, D., Yuen, R. K. C., et al. (2015). The human splicing code reveals new insights into the genetic determinants of disease. Science, 347(6218), 1254806. doi: 10.1126/science.1254806.CrossRefGoogle Scholar
 Zhang, J., Zhang, Z., Xiao, X., Yang, Y., & Winslett, M. (2012). Functional mechanism: Regression analysis under differential privacy. PVLDB, 5(11), 1364–1375.Google Scholar