1 Introduction

In recent years, deep learning language models trained on a large of centralized data have achieved an impressive performance in various NLP downstream tasks (e.g., medical case analysis [1], sentiment analysis [2], next-word prediction in mobile keyboards [3]). Such success is mainly due to advanced machine learning techniques and large-scale data collection. For instance, Zeng et al. trained a medical language model on the Chinese MedDialog (MedDialog-CN) dataset, containing 3.4 million medical consultation records collected from the real world in the medical domain [4]. The medical language model can release the encoded vector-form embeddings of symptom descriptions for developing intelligent medical systems. However, deep learning language models can efficiently capture the critical text information in the training dataset and retain sensitive information like diseases in the vector-form embeddings, resulting in potential privacy disclosure risks. Pan et al. [5] propose a simple exploratory analysis task to obtain the sensitive text information from the language model’s text representation. The experimental results demonstrate that an adversary with nearly zero domain knowledge can infer the sensitive plain text information from the unprotected text embeddings (see in Fig. 1). Once the attacker can access the vector-form embeddings encoded by the medical language model, it may cause a massive privacy breach of patients. Furthermore, the patients would suffer direct and indirect damage like reputation and property.

Fig. 1
figure 1

Overview of the regular language model training and publishing process with sensitive text. The attacker can obtain the sensitive plain text from the unprotected text embedding

Several approaches have proposed to provide privacy guarantees for the language models to prevent privacy leakage. Differential privacy is the most commonly used method to protect the user’s text data privacy. An effective way to reduce the memorization capability of training data is to apply differentially-private training techniques [6]. Shokri et al. [7] firstly train differential private deep models and utilize Stochastic Gradient Descent (SGD) algorithm to optimize the model with formal differential privacy (i.e., DP-SGD). McMahan et al. [8] have applied the DP-SGD for training Long Short-Term Memory (LSTM) language model to achieve the user-level privacy guarantee.

Recently, the sizeable pre-trained language models such as the Bidirectional Encoder Representations from Transformers (BERT) [9] provide state-of-the-art performance. The BERT is too large to deploy on mobile devices. Sanh et al. [10] proposed a lightweight BERT language model pre-trained with knowledge distillation to implement fewer parameters and faster inference, which is called DistillBERT. Unfortunately, applying the differentially private mechanisms for the pre-trained language model leads to declining prediction accuracy [11] significantly because of the DitillBERT model’s normalization layer.

Another way to tackle privacy concerns is sensitive data desensitization based on a generative model. The generative model like Generative Adversarial Networks (GAN) [12] can learn the distribution of the realistic dataset and generate new fake data. There are some studies on differential private GANs [13], a class of generative models recently. The differentially private GANs can synthesize high-quality simulated data to preserve the original sensitive image data [14, 15]. Zhang et al. [16] proposed the differentially private generative adversarial network (DP-SeqGAN). The DP-SeqGAN can generate the new text data similar to the original data distribution. However, the GAN-based model can not train with the text data stably and effectively. Moreover, the model would cost large privacy budgets even under a small noise scale. The privacy-preserving capability is concerned.

To generate the synthetic and high utility text data with a privacy preservation method, we propose a Differentially Private Recurrent Variational AutoEncoder (DP-RVAE), a text generation model based on the variational autoencoder [17] which is more suitable for text data generated than the GANs based models. Our model can reconstruct the sensitive text and generate desensitized text data for the language model training. The DP-RVAE can train stably and efficiently while improving performance with the privacy and accuracy trade-off.

Previous works try to protect privacy under a centralized setting. On the other hand, the participants may be unwilling to transfer their data to a central server due to the data privacy policy [18] and the concerns of data abuse. Consequently, the central server can not directly collect data from an individual client to train a well-performed language model. To release this problem, Federated Learning (FL) is a promising approach that can utilize the data to train a deep learning model across multiple devices for achieving data-driven Deep Learning (DL) solutions [19]. A language model like DistillBERT can jointly train on the individual client-side with the federated learning paradigm while keeping the personal data locally and private from other users or a central server. Despite the property of the FL, there is still a potential risk of leaky data that the attacker can obtain the data from the updated model parameters [20]. Furthermore, it would cause a significant accuracy degradation if the model trained on a centralized dataset compared to a distributed dataset in the federated learning setting [21]. To solve these problems, we extend the DP-RVAE to the federated learning setting [22], and our framework allows DP-RVAE to train and produce privacy-preserving synthetic text for NLP downstream tasks on each client. The sever can directly collect the synthetic text data produced by DP-RVAE from each client. A language model like DistillBERT can be trained more efficiently with centralized synthetic text data than typical NLP federated learning.

In summary, the main contributions of ours are listed as follows:

  • We first propose the Differentially Private Recurrent Variational AutoEncoder (DP-RVAE) model to surrogate high-utility text data with DP guarantee.

  • To improve the utility of the synthetic text data, we introduce the original text as the conditional input of the decoder module while using noise perturbing and word dropout methods to keep privacy.

  • We extend the DP-RVAE to the federated learning scenario. The DP-RVAE is deployed on the client-side to learn the individual features across multiple clients and generates the synthesized high utility text to train the language model on the central server while providing strong differential privacy guarantees for each client.

  • We evaluate our proposed DP-RVAE on the text classification task, achieving a higher average test accuracy by 5.90% and 3.94% than the typical approach in centralized and federated learning scenarios (across various privacy budgets. We also evaluate the defense capability of the DP-RVAE against keyword inference attacks. The DP-RVAE has a lower 15.2% average attack accuracy than the typical differentially private approach, and the attacker can not infer the sensitive information from sentence embedding.

The remainder of this paper organizes as follows. Section 2 reviews the related work of the generative models, the differentially private approaches in the NLP field, and the federated learning. Section 3 reviews the basics of the differential privacy and recurrent variational autoencoder model. Section 4 describes the details of the proposed model DP-RVAE, and the model extends to federated learning. Section 5 presents the results of the utility and evaluation experiments for the DP-RVAE and expansion in federated learning. Conclusions and future work are given in Section 6.

2 Related Work

The natural language processing (NLP) field uses a machine to analyze the human language text data. The NLP downstream tasks significantly improved with the pre-train language model like BERT. Many specific domains began to use the model in their tasks [23]. However, a particular field dataset like medical record [24] is quite sensitive and private. It may raise a privacy concern of collecting/sharing data to train the model. Recent works apply the differential privacy mechanism to prevent the deep learning model from disclosing sensitive information in the training dataset [7]. Moreover, considering a distribution scenario, applying the FL methods enable many individual clients to train their models jointly while keeping their local data decentralized and private from a centralized server [25].

2.1 Data Generative Models

Generative models can learn a joint distribution and generate synthetic data. The generative models with deep neural networks can approximate a likelihood function and have various forms [26, 27] for modelling high dimensional data like text, audios, and images. The Variational AutoEncoder (VAE) [27] is various of regularized autoencoder, which is one of the generative models. The VAE architecture generally involves a deep latent-variable model and an inference model. It allows for efficient latent-variable inference and synthesis. The latent-variable model is a generative model over the dataset. The inference model can approximate the posterior distribution of the generative model’s latent variables, which can be called the encoder. The main difference with the original autoencoder is the encoder. The encoder tries to compress the input data into a representation with low dimensions in the original autoencoder. Moreover, the encoder inside the VAE encodes the input data into a Gaussian probability density [28].

Dai et al. [29] proposed an innovative encoder-decoder architecture is called the sequence-to-sequence model, which introduced the RNNs as the backbones of encoder and decoder and has been successful in NLP downstream supervised tasks. Bowman et al. [30] adjusted the sequence-to-sequence architecture and combined the variational autoencoder for text generation proposed the Recurrent Variational AutoEncoder model (RVAE), which can reconstruct an actual sentence and generate a synthetic text. The following studies [31,32,33] improved the RVAE and achieved better performance.

A similar generative model, Generative Adversarial Network (GAN) [26] were majorly applied in the computer vision field [34]. The GAN is composed of the generator and discriminator modules and plays a zero-sum game in the training phase, a mathematic representation in game theory. The generator and discriminator can converge and achieve acceptable accuracy through this method. However, stability and time-consuming are always significant challenges. Liu et al. [35] proposed an extended GANs network architecture based on a self-conditioned method to stabilize the GANs training. Zhao et al. [36] utilized the differentiable augmentation approach to enhance the efficiency of GANs training. Unfortunately, these attempts focus on the computer vision domain. In the NLP field, only a few studies [37, 38] have been on text generation, while these GAN-based models still can not work very well under a discontinuous space essentially. GAN-based models’ stability and time-consuming training problems for text generation have no practical solutions.

2.2 Natural Language Processing with Differential Privacy

The deep learning technique can represent text data well in natural language processing (NLP). In many NLP tasks, the input text data is firstly represented into a density vector by a language model, then can be applied to the NLP downstream tasks like sentiment analysis and medical case analysis. Devlin et al. [9] approved that the text presentation can significantly leverage the downstream tasks’ accuracy. In many works, researchers pre-train a language model on their private dataset to earn a text representation and publish it for a broad set of NLP tasks like text classification.

However, it remains a privacy leakage risk of the text representation. Some experiments [39, 40] showed that users’ private information could be detected easily by the text representation from the language model via the membership attacks and the model inversion attacks. Preotiuc-Pietro et al. [41] showed that the author’s information could be predicted through the language cues from the text. Furthermore, Pan et al. [5] used the model inversion attack method to reconstruct the sensitive information from the text representation. Carlini et al. [42] demonstrated that it is possible to attack the widely used language model like BERT through technologies of training data extraction. Carlini et al. [43] pointed out that the privacy leakage from the language models is due to the strong memorization ability in neural networks. More concretely, text representation maintains too much information from the training dataset.

To tackle the privacy concerns above, these studies trained differentially private models [7, 44] to provide a strong privacy guarantee. Feyisetan et al. [45] injected the Laplacian noise into the word embedding representation vectors. Mcmahan et al. [8] applied differential privacy to model training for the next-word prediction task on the user-adjacent datasets. From another perspective, training models via adversarial learning can also enhance the robustness and privacy of neural representation in language models [46, 47]. However, the methods above can be available to preserve the private information in the text data while decreasing the accuracy rate for the downstream NLP task. Basu et al. [11] directly applied the gradient noise to provide differential privacy protection to the latest pre-train language model like DistillBERT and observed that the accuracy decreases vastly.

The desensitizing dataset is also a promising way to provide a privacy guarantee for sensitive datasets. Phan et al. [48] used the autoencoder model and introduced the noise into the objective function to encode the raw dataset into density representation vectors and publish the noised representation vectors for analyzing tasks. In the same way, Chen et al. [49] perturbed the gradient while training the variational autoencoder to achieve a more robust guarantee for the training dataset.

Generating simulated data with similar distributions to the original data is another way to desensitize the sensitive dataset [15, 50]. Chen et al. [51] trained a differentially private GANs model on the sensitive dataset, which can generate synthetic data without sensitive information instead of the realistic training dataset for a classified model. The GANs-based model can not be trained stable with gradient perturbing and does not have practical work in discrete data like text data.

2.3 Federated Learning Framework

With the development of decentralized mobile devices or servers, the private security on the mobile devices [52] receives much attention in terms of research and practice [53]. Federated learning is a promising privacy-preserving approach [54], which allows individual clients to collaborate to train models and share the gradients to update the global model while storing their data locally. The federated learning’s objective is to aggregate the local model from each client and optimize the global model. Several federated learning algorithms propose to optimize the global model. Federated Averaging (FedAvg) is a standard algorithm to divide the training phase into rounds and update the global model [22]. The federated proximal (FedProx) adds the proximal term based on the FedAvg algorithm and improves converging stability [55]. The Federated Optimization (FedOPT) is a generalized version of FedAvg and can converge more efficiently in a few rounds [56]. In our work, we propose to use the FedOPT federated learning algorithm to update the global model in a federated learning setting. Even the federated learning framework has a promising future to address the privacy risk of collecting data [57], especially in the NLP field. Users would not like to upload sensitive text to the third-party central server.

However, there is still limited research about federated learning with NLP. Sui et al. [58] processed the medical text data via federated learning for the extraction task. Liu et al. [54] used the pre-train BERT model with the federated learning framework to tackle the analysis of the medical notes, and the medical notes data is from multiple silos. Lin et al. [21] applied the BERT model to the NLP downstream tasks in the federated learning setting. Compared to the centralized setup, it has a quite gap in the accuracy rate under the same dataset and task. In a word, the prior works of NLP in federated learning mainly try to achieve their specific task. They neglect the privacy threats from membership inference attacks [59] and reconstruction attacks [60].

3 Preliminaries

3.1 Differential Privacy

Differential privacy (DP) can provide strong privacy guarantees for sensitive data analysis. With differential privacy preservation, attackers can hardly recover the information of the dataset. A randomized mechanism \({\mathscr{M}}\) with an output range \(\mathcal {R}\). If the \({\mathscr{M}}\) satisfied the following equation, the mechanism is (𝜖,δ)-DP [61]. The formal definition of differential privacy is as follows.

$$ \operatorname{Pr}[\mathcal{M}(S) \in \mathcal{O}] \leq e^{\epsilon} \operatorname{Pr}\left[\mathcal{M}\left( S^{\prime}\right) \in \mathcal{O}\right]+\delta $$
(1)

For any two adjacent datasets S and \(S^{\prime }\) which only differ by one sample, the mechanism holds for any subset of outputs \(\mathcal {O} \in \mathcal {R}\). \(\operatorname {Pr}[{\mathscr{M}}(S) \in \mathcal {O}]\) is the probability of the algorithm getting a specific result outputs \(\mathcal {O}\). In our case, the recurrent variational autoencoder corresponds to the mechanism \({\mathscr{M}}\). The 𝜖 is the privacy budget’s upper bound value of privacy loss. The parameter δ is the failure probability of the differential privacy mechanism \({\mathscr{M}}\). The smaller 𝜖 and δ can achieve stronger privacy guarantees.

The typically differential privacy method for the neural networks is to inject noises into the gradients at the training phase. The privacy budget is an estimation metric of the privacy preservation level for the deep learning model. The Renyi Differential Privacy (RDP) [62] accounting mechanism provides a tighter estimation of privacy budget consumption when training the deep learning model. For any two adjacent datasets S and \(S^{\prime }\), a randomized mechanism \({\mathscr{M}}\) is (α,𝜖)-RDP if the mechanism \({\mathscr{M}}\) satisfies the following equation.

$$ D_{\alpha}\left( \mathcal{M}(S) \| \mathcal{M}\left( S^{\prime}\right)\right) \leq \epsilon $$
(2)

Where the \(D_{\alpha }\left ({\mathscr{M}}(S) \| {\mathscr{M}}\left (S^{\prime }\right )\right )\) is the Renyi divergence and defined as follows, where the parameter α> 1 is the order of the 𝜖-RDP.

$$ D_{\alpha}(\mathcal{M}(S) \| \mathcal{M}(S^{\prime})) \triangleq \frac{1}{\alpha-1} \log E_{x \sim \mathcal{M}(S^{\prime})}\left( \frac{\mathcal{M}(S)}{\mathcal{M}(S^{\prime})}\right)^{\alpha} $$
(3)

The RDP provides the more convenient composition and post-processing properties to account for the privacy budget over a sequence of differentially private mechanisms.

Theorem 1 (Composition)

For a sequence of k mechanisms \({\mathscr{M}}_{1}, {\mathscr{M}}_{2}, \ldots , {\mathscr{M}}_{k}\), each mechanism \({\mathscr{M}}_{i}\) satisfies (α,𝜖)-RDP, the k composition mechanism \({\mathscr{M}}_{1}, {\mathscr{M}}_{2}, \ldots , {\mathscr{M}}_{k}\) is also a \((\alpha , {\sum }_{i=1}^{k}\epsilon _{i})\text {-RDP}\).

Theorem 2 (Post-processing)

If a randomized mechanism \({\mathscr{M}}\) satisfies the (α,𝜖)-RDP, for any subsequent function F of mechanism \({\mathscr{M}}\) will satisfy the (α,𝜖)-RDP.

3.2 Recurrent Variational Autoencoder

The recurrent variational autoencoder can effectively approximate inference with the directed probabilistic models. Given observed a text dataset X = {x(1),x(2),…,x(N)}, the text sample x(i) contains a string of words and can be denoted by \(\boldsymbol {x}^{(i)}=\left \{x_{1}, x_{2}, \ldots , x_{L}\right \}\) the L is the length of the words number of text. The goal of the model is to estimate the parameters 𝜃 while minimizing marginal log-likelihood.

$$ \log p_{\theta}(\boldsymbol{X})=\sum\limits_{n=1}^{N} \log {\int}_{\boldsymbol{z}} p(\boldsymbol{z}) p_{\theta}\left( \boldsymbol{x}^{(n)} \mid \boldsymbol{z}\right) \mathrm{d} \boldsymbol{z} $$
(4)

The \(p_{\theta }\left (\boldsymbol {z}\right )\) is the prior distribution of a latent variable z, where z is sampled from a multivariate diagonal Gaussian distribution. Because of the integration inside the marginal log-likelihood, the equation is undifferentiable and we can not directly use the gradient descent method to optimize the parameter 𝜃 So it can be inverted to the evidence lower bound (ELBO) of the marginal log-likelihood by using an approximation posterior distribution q(zx) of p𝜃(xz).

$$ \begin{array}{@{}rcl@{}} \log p_{\theta}(\boldsymbol{x}) &\geq & \mathbb{E}_{q_{\phi}(\boldsymbol{z}\mid \boldsymbol{x})}\left[\log p_{\theta}(\boldsymbol{x} \mid \boldsymbol{z})\right] \\ &&-\mathcal{D}_{KL}\left( q_{\phi}(\boldsymbol{z} \mid \boldsymbol{x})\|p(\boldsymbol{z})\right) \end{array} $$
(5)

We used the encoder which includes a single-layer LSTM combining with two fully-connected layers to predict the posterior distribution \(q_{\emptyset }\left (\boldsymbol {z}\mid \boldsymbol {x}\right )\). More concretely, the posterior distribution \(q_{\emptyset }\left (\boldsymbol {x}\mid \boldsymbol {z}\right )\) can be assumed as a multivariate diagonal Gaussian distribution.

$$ q_{\phi}(\boldsymbol{z}\mid\boldsymbol{x})=\mathcal{N}\left( \boldsymbol{z}; \mu_{\phi}(\boldsymbol{h}), \sigma_{\phi}(\boldsymbol{h})\right) $$
(6)

The function μϕ and σϕ both are linear layers to predict the mean and variance of the multivariate diagonal Gaussian distribution according to the hidden state vector h which is the final state output of the LSTM encoder that maps a sequential text input of \(\boldsymbol {x}=\left \{x_{1}, x_{2}, \ldots , x_{L}\right \}\).

The operation of sampling latent variable z from \(q_{\emptyset }\left (\boldsymbol {z}\mid \boldsymbol {x}\right )\) is non-continuous resulting that the gradient can not be computed and passed through. So we refine the sampling operation by \(\boldsymbol {z}=\mu _{\phi }(\boldsymbol {x})+ \upbeta {\sum }_{\phi }^{\frac {1}{2}}(\boldsymbol {x})\), where the β is sampled from \(\mathcal {N}(0, \boldsymbol {I})\).

The decoder module is also a LSTM layer which maps the latent variable z as the initial hidden states to the text sequence sample input and generates a new text sample while modeling the distribution of p𝜃(xz) relies on the latent variable z. The model can be trained with the stochastic gradient descent and minimize the Loss function, where the N is the batch size.

$$ \begin{aligned} \operatorname{Loss}(\boldsymbol{X} ; \phi, \theta)=& \sum\limits_{i=1}^{N} \mathbb{E}_{q_{\phi}\left( \boldsymbol{z}\mid\boldsymbol{x}^{(i)}\right)}\left[\log p_{\theta}\left( \boldsymbol{x}^{(i)} \mid \boldsymbol{z}\right)\right] \\ &-\sum\limits_{i=1}^{N} \mathcal{D}_{KL}\left( q_{\phi}\left( \boldsymbol{z}\mid\boldsymbol{x}^{(i)}\right) \| p(\boldsymbol{z})\right) \end{aligned} $$
(7)

The first term of the equation is to encourage the model to reconstruct the original text input. The second term of the equation uses the Kullback-Leibler (KL) divergence, which can evaluate the similarity of the distributions between the p𝜃(xz) and p𝜃(z).

4 Proposed Method

This section describes how to introduce the differential privacy mechanisms to the recurrent variational autoencoder to generate high-utility desensitization text data, which is differentially private and hard for attackers to infer the original data information. In addition, we apply our method to protect personal text data privacy in both centralized and federated learning scenarios.

4.1 Sensitive Text Privacy Preservation Through DP-RVAE

The proposed approach DP-RVAE takes the sensitive text data as the input and reconstruct that to desensitized text data to protect the sensitive data. The synthetic text data can be used to train the language model while maintaining high utility (Fig. 2). To make federated learning adapt to mobile device environments, we use the DistillBERT [10] as the language model in our case, which is a smaller and faster version of BERT.

Fig. 2
figure 2

Overview of proposed differentially private synthetic text data generated with the DP-RVAE approach

For the DP-RVAE generative model, we train it on a sensitive dataset D using a differentially private algorithm DP-SGD [7]. The differentially private training algorithm DP-SGD injects the noise into the stochastic gradients while training to keep the data of user-level private. We can provide a strong guarantee for personal sensitive text data via DP-RVAE. The text generated by the DP-RVAE can train the DistilBERT language model to predict the downstream tasks.

Our model is based on the recurrent variational autoencoder. It mainly involves two modules: the encoder and decoder, all single-layer LSTM to adapt the original autoencoder to text data.

4.2 Differential Privacy with RVAE

In this section, we will introduce the details of the differential privacy mechanisms in DP-RVAE (see Fig. 3). For the loss function of the recurrent variational autoencoder (Eq. 7) with a mini-batch of N samples X = {x1,x2,…,xN}, each sample xi is a sentence containing a sequential word tokens which represented by a integer vector. Then we pass the word tokens xi into the word embedding module to get a float embedding vector \(\boldsymbol {e}_{\boldsymbol {x}_{\boldsymbol {i}}}\). We use the stochastic gradients descending algorithm to update the model parameters ϕ and 𝜃, where the ϕ is the paramter of the encoder module and the 𝜃 is the paramter of the decoder module. Suppose that, a mini-batch of N latent variables Z = {z1,z2,…,zN} which samples from the encoder \(q_{\phi }\left (\boldsymbol {z} \mid \boldsymbol {e}_{\boldsymbol {x}}\right )\). Then we compute the average gradient ηd of the decoder for a batch.

$$ \operatorname{\boldsymbol{\eta}_{d}} = \frac{1}{N} \sum\limits_{i=1}^{N} \nabla_{\theta} \log p_{\theta}\left( \boldsymbol{e}_{\boldsymbol{x}_{\boldsymbol{i}}} \mid \boldsymbol{z}_{i}\right) $$
(8)
Fig. 3
figure 3

The DP-RVAE architecture

Then we compute the average gradient ηe of the encoder for a batch in the same way.

$$ \operatorname{\boldsymbol{\eta}_{e}} = \frac{1}{N} \sum\limits_{i=1}^{N} \nabla_{\phi}\left[\log p_{\theta}\left( \boldsymbol{e}_{\boldsymbol{x}_{\boldsymbol{i}}} \mid \boldsymbol{z}_{i}\right)-\text{KL}\left[q_{\phi}\left( \boldsymbol{z}_{i} \mid \boldsymbol{e}_{\boldsymbol{x}_{\boldsymbol{i}}}\right) \| p(\boldsymbol{z}_{i})\right]\right] $$
(9)

In our work, we only perturb the gradients of the encoder module to achieve better results while ensuring security. First, we use the clip gradient operation with the gradient norm bound parameter C to the gradient of the encoder module inside our model.

$$ \operatorname{\boldsymbol{\eta}_{e}}=\operatorname{Clip}\left( C, \operatorname{\boldsymbol{\eta}_{e}}\right) $$
(10)

Secondly, we add the Laplace Noise into the gradient of the encoder module to make it differentially private, where the σs is the scale of the noise.

$$ \operatorname{\boldsymbol{\eta}_{e}}=\operatorname{\boldsymbol{\eta}_{e}}+\operatorname{Laplace}\left( 0, {\sigma_{s}^{2}} C^{2} \boldsymbol{I}\right) $$
(11)

In the end, we apply the stochastic gradients descent algorithm to optimize all model parameters 𝜃. Note that the model’s training is differentially private because of the post-processing theorem of differential privacy. To keep the model differentially private, we use the < unk > character as the conditional text input of the decoder module to generate more personalized text and preserve the sensitive input text data.

4.3 Noise Perturbing and Word Dropout

In the previous section, all of the conditional text input of the decoder module is < unk > characters, and the generated result may be random, which ultimately change the input text meaning. So we utilize the original text instead of all < unk > characters as the conditional input of the decoder module. To ensure the model training satisfies the differential privacy, we introduce the word dropout, and noise perturb mechanism to process the original text to obtain the conditional input of the decoder module. Under the word dropout mechanism, each word in the original text has the equal dropout probability of replacing with the < unk > character. Furthermore, we add the Laplace noise into the conditional input of the decoder module. The word dropout and noise perturb mechanism can trade-off utility and privacy which has been mentioned and proved as a formal differential privacy mechanism [63].

For a sensitive text input of the encoder module \(\boldsymbol {x}=\left \{x_{1}, x_{2}, \ldots , x_{L}\right \}\), we first use the word dropout method to randomly mask the word in the text input. More concretely, let apply a masked vector \(\boldsymbol {I}_{mask}={\left \{0,1\right \}}^{L}\) with dropout probability ρ to the input text xImask, the zeroes in the Imask obey the uniform distribution. The word xi will be replaced by the < unk > characters if the element’s value equals 0 at the corresponding position in Imask. When dropout rate ρ = 1, it means all words will be replaced by < unk > character, as mentioned in the last section. Moreover, we employ the embedding mechanism to represent the feature of each word, which can be denoted by:

$$ \boldsymbol{F}=\operatorname{Embedding}(\boldsymbol{x})=\left\{\boldsymbol{e}_{x_{1}},\ \boldsymbol{e}_{x_{2}},\ldots,\boldsymbol{e}_{x_{L}}\right\} $$
(12)

Then we inject the noise with Laplace distribution into the words features F to make it satisfy the 𝜖-DP guarantee as follows.

$$ \boldsymbol{\hat{F}}=\operatorname{T}(\boldsymbol{F})=\boldsymbol{F}+\operatorname{Laplace}(\gamma) $$
(13)

Where the \(\gamma =\frac {\Delta f}{\epsilon }\) is the scale of Laplace noise, the Δf is the sensity of the differential privacy, and the 𝜖 is the privacy budget. Here we bound the sensity of each element of the text with 1 (i.e., Δf = 1). And the T(F) is differential privacy.

The word dropout operation with the differentially private noise perturbation combination is still differentially private, and the privacy budget lowers to:

$$ \epsilon=\ln \left[(1-\rho) e^{\frac{1}{\gamma}}+\rho\right] $$
(14)

According to the composition theorem of differential privacy, we can still train DP-RVAE while satisfying the differential privacy. We denote the generated text Y with the input text x as:

$$ \boldsymbol{Y}=\text{DP-RVAE}\left( \boldsymbol{x}, \operatorname{T}\left( \boldsymbol{x} * \boldsymbol{I}_{\text {mask}}\right), \rho, \gamma\right) $$
(15)

4.4 DP-RVAE with Federated Learning

In the federated learning setup (Fig. 4), we deploy the DP-RVAE to s clients. In the initialized round, we use the public corpus to train the DP-RVAE in the center-sever, then broadcast it to each client. The language model DistillBERT is pre-trained. We initialize the classifier randomly and broadcast it to each client for prediction tasks.

Fig. 4
figure 4

The DP-RVAE in the federated learning setting

In each round, the clients receive the latest global model DP-RVAE MGlobal and DistillBERT LM from the server. We train the DP-RVAE with the sensitive dataset Di on the client-side. Then the local model Mi learns from the features of individual text, and the parameters \({\widetilde {\theta }}_{i}\) will be updated. DP-RVAE can generate a corresponding personalized synthetic dataset \({\widetilde {D}}_{i}\). Then the local DistillBERT language model LM performs the prediction by querying the generated synthetic dataset \({\widetilde {D}}_{i}\).

Each client uploads the updated parameters \({\widetilde {\theta }}_{i}\) and synthetic dataset \({\widetilde {D}}_{i}\) to center server. The server aggregates the updated parameters and synthetic datasets from each client. For the global DP-RVAE, we perform the FedOPT federated learning algorithm to update the model parameters to consider a large amount of heterogeneous text data in NLP from different clients. The server first calculates the aggregated local model parameters change as \({\Delta }={{\sum }_{i}^{S}} p_{i} {\widetilde {\theta }}_{i} / {{\sum }_{i}^{S}} p_{i}\), where the pi is the weight of client. To simplify, we assume the client has the same weight. Then the global DP-RVAE MGlobal is updated according to the aggregated parameter change Δ. The centralized synthetic dataset \(\widetilde {D}\) can be directly used to fine-tune the DistillBERT language model. By this paradigm, the language model can improve performance and reduce the negative impact from the distribution dataset in the federated learning setting.

5 Experiment and Security Analysis

In this section, we report the experimental results of the DP-RVAE on the two text classification datasets, Tweets Depression Sentiment [64], and IMDB Reviews [65] datasets. We utilize Opacus for our experiments and analyze the trade-off between privacy and the utility of the generated text. First, we use the generated synthetic text data for downstream NLP tasks with the language model DistillBERT and compare it to NLP task prediction accuracy of the benchmark of differentially private DistillBERT model with real-world datasets to evaluate the utility while consuming the same privacy budget to ensure privacy in the centralized setting. Second, we perform the keywords inference attack experiment to demonstrate the privacy-preserving capability of the DP-RVAE. Also, we test the DP-RVAE with our proposed federated learning paradigm. We compare it to the typical NLP federated learning methods with DistillBERT in the same NLP tasks.

5.1 Implementation Details

We describe the exact model’s hyper-parameters settings and include all of the details of the datasets for implementing the mechanisms and models practically.

Tweets Depression Sentiment Dataset

The Tweets Depression Sentiment dataset [64] includes tweets to detect depression tendency in the web, which is scraped from Twitter for the study, and data cleaning was performed while scraping. There are 2477 samples for training and 619 samples for testing.

IMDB Reviews Dataset

The IMDB Reviews dataset [65] involves 50000 samples for binary sentiment classification containing substantially more data than previous benchmark datasets, and each sample has an accurate sentiment annotation.

Medical Description Dataset

The medical description data comes from the CMS public healthcare records. According to the practice of Pan et al. [5], we pre-process the textual Healthcare Common Procedure Coding System (HCPCS) descriptions and use word matching to find the sentences which include the 10 keywords (e.g., head, hand, and face). We obtain a medical description dataset that contains 200,000 sentences while using the sentences to pre-train the DistillBERT language model for the sentence embeddings.

Hyper-parameters Settings

For the DP-RVAE, we set the noise scale γ = 0.001, the word dropout rate ρ = 0.6 in the encoder module, and the clip norm rate C = 1.0 the decoder module. We use the DP-SGD optimizer for the DP-RVAE, and the learning rate set is 0.05. For the differentially private benchmark model, we use the Adam optimizer with a learning rate of 0.0001. We use two layers of fully-connected neural networks as the classifier model and the Cross-Entropy as the text classification model’s loss function.

5.2 The Utility Evaluation of the Synthetic Data

For the utility evaluation of the synthetic text dataset that our model generated, we first train the DP-RVAE on a realistic dataset. The utility means the performance of the classifier model with the synthetic data on downstream tasks. Moreover, we use the accuracy metric to evaluate the model’s performance. To obtain the same label as the actual dataset, we apply the word dropout and noise perturbation mechanism with the original text as the additional input of the encoder module. Then we train the pre-trained language model with the classifier of the specific NLP task on the synthetic dataset. We keep track of the privacy budget spent in our algorithm by using the Renyi-DP accountant [62]. We use 80% of the dataset as training data and the rest as test data for each dataset. The batch size is 32. We choose the differentially private pre-training language DistillBERT with two fully-connected layers and a Tanh activation function and Sigmoid function to predict the benchmark model and train on the realistic dataset directly. We ran the experiment in different target privacy budget values, the privacy budget \(\epsilon =\infty \) means without privacy guarantees, the probability of failure δ = 1 × 10− 5.

The result, as shown in Table 1, our model performs better on two datasets under a low privacy budget. With the comparison of the DP-DistillBERT model, as an example, we see that our DP-RVAE with an average test accuracy of 54.10% and DP-DistillBERT with 48.20%, an average test accuracy improvement of 5.90% in a highly tight guarantee of privacy level (i.e., 𝜖 = 0.5). Without a privacy budget (i.e., \(\epsilon =\infty \)), our model’s average test accuracy with 72.13% and the benchmark model DP-DistillBERT with 77%, only reduce 4.87% test average accuracy. It means that the synthetic text data generated by our DP-RVAE has a small gap with the actual data.

Table 1 Average test accuracy of models trained in a centralized setting

According to the experimental results, our model can generate a high-utility synthetic text, and the DistillBERT can still learn the feature information from the synthetic text data. The more experimental details on the Tweets Depression Sentiment dataset show in Fig. 5, we see that for a fixed and formal privacy level \(\left (\epsilon \le 7.5\right )\), our model consistently outperforms DP-DistillBERT significantly. With the privacy budget increasing, the DP-DistillBERT changes into the DistillBERT with original text data training gradually. Consequently, it has higher accuracy than DP-RVAE but without a privacy guarantee nearly.

Fig. 5
figure 5

Test accuracy comparison of models for various privacy budgets 𝜖 on Tweets Depression Sentiment dataset

In addition, we find that with the value of the privacy budget decreasing, the classifier is harder to make the right decision. It demonstrates that the text feature is more difficult to be captured by the language model. From another aspect, it provides more robust privacy. Even though the attackers hijack the synthetic text, they still can not obtain sensitive information.

The impact of the failure probability parameter δ on the average accuracy of the DP-RVAE shows in Fig. 6. We can find that the test accuracy with different values of δ is almost equal under the same privacy budget. Also, we can observe that the larger δ can result in a more significant bias of test accuracy. According to the definition of differential privacy, the larger δ means more noise injection when the privacy budget 𝜖 is fixed. So we conduct that the noise scale can affect the DP-RVAE’s performance.

Fig. 6
figure 6

Test accuracy comparison of failure probability δ for various privacy budgets 𝜖 on Tweets Depression Sentiment dataset with DP-RVAE

To demonstrate the effectiveness of using our DP-RVAE to generate a high utility sentence, we sample the generated sentence in three privacy budgets: 𝜖 = 0.5, 𝜖 = 5, and \(\epsilon =\infty \) (i.e., without DP).

As shown in Table 2, as the value of the privacy budget decreases, the generated text becomes more confusing, and the < unk > character occurs more frequently. Despite that, we can still distinguish the sentiment from the generated text, similar to the original text. The generated text can prevent sensitive information disclosure like the “1st Birthday party” and “headache” state.

Table 2 A sample of generated text for various privacy budgets 𝜖 on Tweets Depression Sentiment dataset

5.3 Keyword Inference Attack Experiment

We evaluate the defence capability of the DP-RVAE with the DistillBERT language model against the deep artificial neural networks (DANN) based keyword inference attack proposed by Pan et al. [5] on the medical description dataset. We assume that the attacker has known the ten exact sensitive disease keywords (e.g., head, hand, and face) and infers the target sentence embedding whether it contains a specific keyword. We provide the experimental results of the average attack accuracy on ten sensitive keywords under different privacy budgets.

From Table 3, the average keywords inference attack accuracy of our DP-RVAE decreased by 15.2% compared to the typical differentially private approach on the Medical Description dataset. We can observe that the keywords inference attack accuracy is only 8.63% when the privacy budget 𝜖 is set to 1 and can barely obtain the critical sensitive keywords information from the generated text with our DP-RVAE. Figure 7 shows the overall average accuracy of the keywords inference attack. The DP-DistillBERT and DP-RVAE can weaken the keywords inference attack to a random guessing manner when the privacy budget is under 1.5 compared to the DistillBERT without privacy guarantees. We can find out that the attack accuracy of our DP-RVAE is 53% even under a high privacy budget (i.e., 𝜖 = 2.5) and decreased by 18% compared with the DP-DistillBERT model.

Table 3 Average accuracy of keyword inference attack on the medical description dataset
Fig. 7
figure 7

Keyword inference attack average accuracy comparison of models for various privacy budgets 𝜖 on Medical Description dataset

To demonstrate the defence capability of the DP-RVAE, we have shown more detailed results of the keyword inference attack experiment for each keyword in Fig. 8. We can observe that the DistillBERT has several specific keywords like the word hip which can be obtained by the attackers easily. For example, the attack accuracy of the word hip can achieve around 86%. When the privacy budget 1, the attack accuracy of the word hip decreases to 23%, we also observe that the DP-RVAE can provide better defence. The attack accuracy of each keyword ranges from 11% to 25%. The DP-RVAE can achieve a more robust defence capability than the DP-DistillBERT model. When the privacy budget is 0.5, the attackers can barely infer any keyword from the sensitive sentence embedding. In particular, the attack accuracy of the word hip is decreased to 8.2% significantly.

Fig. 8
figure 8

Keyword inference attack accuracy for each keyword comparison of models on the medical description dataset in privacy budgets 𝜖 = 0.5,1,1.5

5.4 An Experiment in Federated Learning

In the federated learning setting, we use the same configuration with each client. We partition the dataset into N parts equally and randomly to simulate the experiment. To validate the effectiveness of our paradigm, we compare it to the typically federated learning DistillBERT in the NLP task mentioned in [21]. We run the experiment in the cross-silo setting, and each round selects the same clients.

We use 10 clients for the text classification task on the Tweets depression sentiment and IMDB reviews datasets for more details. In each round, the local epoch number sets to 2. The DP-RVAE and DistillBERT model’s hyper-parameters settings are the same as the previous experiment in a centralized setup. We add the differential privacy locally and aggregate each client’s model to perform the Federated Optimisation (FedOPT) federated learning algorithm. We retain 80% of the training dataset and randomly split it into 10 parts equally. We utilize the rest 20% of the dataset to test on the server-side. The model train phase will stop when exhaust the privacy budgets.

From the results represented in Table 4, our paradigm can achieve higher accuracy than the typical NLP federated learning under a lower privacy budget. We observe that our federated learning paradigm with the DP-RVAE improves test average accuracy with 4.94% on the two datasets under a high privacy guarantee (i.e., 𝜖 = 0.5). We argue that our DP-RVAE can still generate high-quality and personalized synthetic text data for each client. It can simulate a realistic dataset. Furthermore, without a privacy guarantee (i.e., \(\epsilon =\infty \)), the test accuracy in our paradigm improved 6.33% on the Tweets Depression Sentiment Dataset and improved 1.96% on the IMDB Reviews Dataset compared to the typical federated learning framework, respectively.

Table 4 Average test accuracy of models trained in a federated learning setup

Additionally, our paradigm achieves higher accuracy under various privacy budgets (see in Fig. 9) than the typical DP-DistillBERT with federated learning. We argue that the language model is more effective in performance in a centralized setup even though using a simulating dataset than a real dataset in the distribution setup.

Fig. 9
figure 9

Test accuracy comparison of models for various privacy budgets 𝜖 on the Tweets Depression Sentiment dataset in the federated learning setting. The FL-DistillBERT is built with DistillBERT trained in the federated learning setting without DP guarantee

Moreover, we compare the test classification accuracy of the original federated learning with DP-DistillBERT and our DP-RVAE federated learning paradigm under various communication rounds R on the Tweets Depression Sentiment Dataset. Notably, we fix the total privacy budget 𝜖 = 30, and the local epoch number sets to 1 in each communication round for the model training phase (see in Fig. 10).

Fig. 10
figure 10

Test accuracy comparison of models for various rounds R when exhausting a fixed privacy budget (𝜖 = 30) on the Tweets Depression Sentiment dataset in the federated learning setting

As a result, our DP-RVAE with DitsillBERT in federated learning paradigm begins to converge and achieve a test accuracy of 62% when round R = 3. Within a given privacy budget 𝜖 = 30, the original DP-DistillBERT with federated learning approach exhausted the privacy budget when round R = 8 and the highest test accuracy is only 56%. From the comparison, we can conduct that our federated learning paradigm can achieve a better test accuracy under a given privacy budget. On the other hand, our federated learning paradigm costs less privacy budget and rounds for a more robust privacy guarantee while achieving a better performance.

5.5 Analysis of DP-RVAE Model

5.5.1 Computation Complexity Analysis

Our DP-RVAE preserves text privacy for the NLP tasks by generating simulated data. The DP-RVAE’s computation complexity can be composed of the RVAE module, the Laplace noise injection, and the word dropout mechanism. The RVAE module is a standard sequence-to-sequence model, including two-layers LSTM, linear with the input text sequence length. Hence, the time complexity of the RVAE module is O(m + t) for computing a sample, where the m is the input text length, and the t is the generated text length. The time complexity of the Laplace noise injection and the word dropout mechanism is slight, which equals the data size n. As a result, the total time complexity of the DP-RVAE is O((m + t)n + n).

While comparing to train the DP-DistillBERT, the time complexity of the DP-DistillBERT is O(m2dn + n), where m is the input length, d is the dimensions of hidden vectors, and the n is the complexity of the noise injection operation, which is equal to the data size. In this way, training the DP-RVAE can reduce the time-consuming rather than DP-DistillBERT for one epoch. In the federated learning setting, the DP-DistillBERT model trains with the typical federated learning method would yield a heavy computing burden for the clients. On the contrary, our paradigm trains the DP-RVAE on the client-side and generates the synthetic text once while reducing the clients’ computation costs.

As opposed to our federated learning paradigm, there is another mainstream privacy-preserving federated learning framework based on the Security Multi-part Aggregation (SMA) method, which is proposed by Bonawitz et al. [66]. The updated DistillBERT model’s parameters from each client can be aggregated safely by the SMA. However, this method may cause additional computational costs for the client device. More specifically, each client performs 2s key agreements and creates t-out-of-n Shamir secret shares. Then generates s − 1 values for every other client for each entry in the input vector by stretching one pseudorandom generator (PRG) seed each. Consequently, each client’s additional computational cost is O(s2 + sn), where the s is the number of clients and the n is data size. Compared with our paradigm, this is an enormous computation cost.

5.5.2 Security Analysis

We mainly consider two potential privacy attacks against the deep learning model for security analysis: gradient leakage and keywords inference attacks. First, we consider that an honest-but-curious client can work as a passive attacker to infer other clients’ sensitive information from the gradients when aggregating the DP-RVAE parameters in a federated learning setting. Assume the attacker can access the updated DP-RVAE parameters or gradients, and they can use the gradient leak attack methods to obtain the plain text. Even though there is no such practical attack work in the NLP federated learning. Most of the existing research is on the computer vision domain.

In this case, the differential privacy with gradients noise perturbation can effectively prevent the gradient leak attack. A mechanism can provide the differentially private preservation if it satisfies the (𝜖,δ)-DP definition. Differential privacy is a strictly mathematical definition of data privacy, and we introduce it in Section 3.1 that hypothesizes the attacker has exhaustive background knowledge and can prevent any attack in theory. Following this, we perform the differential privacy analysis for the DP-RVAE model, which can satisfy the differential privacy and allow for quantitative privacy analysis.

For the encoder module, we have adopted the stochastic gradient descent training algorithm proposed by Abadi et al. [7] to implement the differential privacy. Based on the RDP accounting mechanism, the encoder module is \(\left (\mathcal {O}(q \epsilon \sqrt {t})+\frac {\log 1 / \alpha }{\lambda -1}, \delta \right )\)-differentially private. Here, t is the training steps, and q is the training data sampling probability. We employ the noise perturbing and word dropout mechanisms to satisfy the 𝜖-differentially private for the decoder module. The privacy budget is \(\epsilon =\ln [(1-\rho ) \exp (\frac {\Delta f}{b})+\rho ]\). According to Theorem 1 (i.e., the composition theorem), combining the encoder and decoder modules can still achieve a differential privacy guarantee for sensitive data. In conclusion, our DP-RVAE can prevent the gradients leak attack in the federated learning setting.

Secondly, the keyword inference attack is another potential privacy attack, a variation of model inversion attacks. We suppose that the attacker has invaded the clients. Furthermore, the attacker can access the outputs from the DistillBERT language model when the victim client utilizes the trained DitillBERT to predict a specific NLP task. The attacker can reconstruct the plain text from the text embedding and judge whether a sensitive keyword belongs to the original input text, which is extremely dangerous for the language models. In our approach, the DistillBERT utilizes the synthetic text generated by the DP-RVAE to train and predict the downstream tasks. At the same time, the generated text data is still differentially private according to Theorem 2 (i.e., the post-processing theorem). The client can utilize the generated text to train the language model DistilledBERT without privacy leaky. The attacker can barely infer the sensitive keyword from the text embedding even though the DistillBERT trains without privacy-preserving algorithms. Despite that, we have experimented on the DP-RVAE with the DistillBERT language model under the keyword inference attack [5] on the medical dataset to demonstrate the privacy security in the experiment (see in Section 5.3).

6 Conclusion and Future Work

This paper presents the DP-RVAE to generate simulated text for downstream task model training with a formal differential privacy guarantee. In addition, we propose a training paradigm based on the DP-RVAE in federated learning. Our experimental results show that the DP-RVAE can generate high-utility text data. Furthermore, the language model can be trained on the synthetic text effectively. Even though each client has a limited dataset in a federated learning setting, the proposed training paradigm can obtain better accuracy in the NLP downstream tasks. Because of the absence of the non-IID NLP dataset from the real world, we can not further evaluate the performance of our DP-RVAE with the non-IID data in the federated learning setting. As future work, we plan to explore how to generate more customization text to leverage the accuracy of the NLP tasks. We also intend to optimize the DP-RVAE for less computation complexity and better applied to the mobile device in the federated learning setting.