Introduction

The widespread collection of user data in modern machine learning has raised concerns regarding privacy violations and the potential disclosure of sensitive personal information [1, 2]. To address these concerns, Federated Learning [3] was introduced as a collaborative machine learning paradigm, where users’ devices train a global predictive model without transmitting raw data to a central server. While FL offers promises of preserving user privacy and maintaining model performance, the heterogeneity of data distributions among clients can lead to challenges such as reduced model utility and convergence issues during training. In response, personalized federated learning approaches have emerged, aiming to tailor models to clusters of users with similar data distributions [4,5,6].

Furthermore, it has been demonstrated that avoiding the release of users’ raw data alone does not provide sufficient protection against potential privacy violations [7,8,9]. To address this issue, researchers have explored the application of differential privacy (DP) [10, 11] to federated learning, providing privacy guarantees for users participating in the optimization process. DP mechanisms introduce randomness in the model updates released by clients, making each user’s contribution to the final model probabilistically indistinguishable up to a certain likelihood factor. To bound this factor, the domain of secrets (i.e., the parameter space in FL) is artificially constrained, either to offer central [12, 13] or local DP guarantees [14, 15]. However, constraining the optimization process to a subset of \(\mathbb {R}^n\) can have negative effects, such as when the optimal model parameters for a particular cluster of users lie outside such a bounded domain.

To address the challenges of personalization and local privacy protection, this work proposes the adoption of a more general notion of DP called d-privacy or metric-based privacy [16] which has been in the spotlight of late mainly in the context of location-privacy [17,18,19]. This concept of privacy does not require a bounded domain and provides guarantees based on the distance between any two points in the parameter space. Therefore, assuming that clients with similar data distributions have similar optimal fitting parameters, d-privacy offers strong indistinguishability guarantees. Conversely, privacy guarantees degrade gracefully for clients with significantly different data distributions.

In addition to addressing privacy concerns in personalised FL as was studied in [20], this work extends the analysis and investigates the impact of the proposed method on fairness aspects in federated model training. As machine learning-based decision systems become more prevalent, it has become apparent that many of these systems exhibit gender and racial biases that disproportionately affect minority populations [21, 22]. Therefore, beyond protecting user privacy, it is crucial to explore cutting-edge machine learning algorithms that can potentially mitigate this pervasive lack of fairness among participating clients. However, systems aiming to protect privacy while ensuring fairness often involve a trade-off between the two [23]. This trade-off arises because privacy protection techniques based on DP tend to minimize the impact of outliers or minorities within the overall dataset. In other words, the application of d-privacy, a metric-based generalization of DP, to personalized FL could potentially compromise the fairness of the machine learning model. Building upon [20], this paper presents extensive experimental results demonstrating that the use of personalized FL under group privacy guarantees not only significantly improves fairness compared to the classical (non-personalized) FL framework, but it also maintains a relatively small trade-off between privacy and fairness.

In summary, the contributions of this paper are the following: it extends the work pursued in [20] (points 1 and 2) and it investigates the implications of our proposal on the fairness of the model (point 3):

  1. 1.

    A novel algorithm is put forward for collaborative training of machine learning models, leveraging advanced techniques for model personalization and addressing user privacy concerns by formalizing privacy guarantees in terms of d-privacy.

  2. 2.

    This research focuses on studying the Laplace mechanism under Euclidean distance, and providing a closed-form expression for its generalization in \(\mathbb {R}^n\), as well as an efficient sampling procedure.

  3. 3.

    It shows that personalized federated learning under formal privacy guarantees improves group fairness significantly compared to the non-personalized federated learning framework and, hence, establishes that this method enhances the trade-off between privacy and fairness.

The rest of this paper is organized as follows. Section “Background” introduces the relevant foundations of federated learning, differential privacy, and fairness notions. Section  “Related Works” discusses the related works for our research. Section “An Algorithm for Private and Personalized Federated Learning” explains the proposed algorithm for personalized federated learning with group privacy. Section “Experiments” illustrates how the proposed method works in terms of privacy and fairness, and Section “Conclusion” provides our concluding remarks.

Background

Personalized Federated Learning

The problem of personalized federated learning falls within the framework of stochastic optimization, and the notation from [4] is adopted here to determine the set of minimizers \(\theta _j^* \in \mathbb {R}^n\) with \(j \in \left\{ 1, \dots , k \right\}\) of the cost functions

$$\begin{aligned} F(\theta _j) = \mathbb {E}_{z\sim \mathcal {D}_j} \left[ f(\theta _j;\,\,z)\right] , \end{aligned}$$
(1)

where \(\{\mathcal {D}_1,\ldots ,\mathcal {D}_k\}\) are the data distributions which cannot be accessed directly but only through a collection of client datasets \(Z_c=\left\{ z | z \sim \mathcal {D}_j, z \in \mathbb {D} \right\}\) for some \(j\in \{1,\ldots ,k\}\) with \(c \in C = \left\{ 1, \dots , N \right\}\) the set of clients, and \(\mathbb {D}\) a generic domain of data points. C is partitioned in k disjoint sets

$$\begin{aligned} S_j^* = \{c\in C \mid \forall z \in Z_c, \, z \sim \mathcal {D}_j\} \quad \forall \,j\in \{1,\ldots ,k\} \end{aligned}$$
(2)

The mapping \(c \rightarrow j\) is unknown and it is necessary to rely on estimates \(S_j\) of the membership of \(Z_c\) to compute the empirical cost functions

$$\begin{aligned} \tilde{F}(\theta _j) = \frac{1}{|S_j|}\sum _{c \in S_j} \tilde{F_c}(\theta _j;\, Z_c); \quad \tilde{F_c}(\theta _j;\,\,Z_c) = \frac{1}{|Z_c|}\sum _{z_i \in Z_c}f(\theta ;\,\, z_i) \end{aligned}$$
(3)

The cost function \(f :\mathbb {R}^n\times \mathbb {D} \mapsto \mathbb {R}_{\ge 0}\) is applied on \(z \in \mathbb {D}\), parametrized by the vector \(\theta _j \in \mathbb {R}^n\). Thus, the optimization aims to find, \(\forall \,j\,\in \{1,\ldots ,k\}\),

$$\begin{aligned} \tilde{\theta }_j^* = {\mathrm{arg\,min}}_{\theta _j}\tilde{F}(\theta _j) \end{aligned}$$
(4)

Privacy

d-privacy, introduced in [16], extends the concept of differential privacy (DP) to any domain \(\mathcal {X}\), which represents the original data space and is equipped with a distance measure \(d:\mathcal {X}^2\mapsto \mathbb {R}_{\ge 0}\), along with a space of secrets \(\mathcal {Y}\). A random mechanism \(\mathcal {R}: \mathcal {X} \mapsto \mathcal {Y}\) is considered \(\varepsilon\)-d-private if, for any \(x_1,x_2\in \mathcal {X}\) and measurable \(S\subseteq \mathcal {Y}\), the inequality in Eq. (5) holds:

$$\begin{aligned} \mathbb {P}\left[ \mathcal {R}(x_1) \in S\right] \le e^{\varepsilon d(x_1,x_2)} \mathbb {P}\left[ \mathcal {R}(x_2) \in S\right] \end{aligned}$$
(5)

It is important to note that when \(\mathcal {X}\) corresponds to the domain of databases and d represents the distance based on the Hamming graph of their adjacency relation, Equation (5) aligns with the standard definition of DP in [10, 11]. However, in this study, \(\theta \in \mathbb {R}^n\) is considered as both the domain \(\mathcal {X}\) and the space of secrets \(\mathcal {Y}\). The primary motivation behind employing d-privacy is to preserve the topology of the parameter distributions among clients. Specifically, it aims to ensure that clients with similar model parameters in the non-privatized space \(\mathcal {X}\) will communicate approximate model parameters in the privatized space \(\mathcal {Y}\), on average.

Fairness

With the recent surge of interest in building ethical ways to train machine learning models, the topic of fairness in machine learning has been in the spotlight and, correspondingly, various metrics and algorithms to quantify and establish fairness in model training have been studied from a variety of perspectives and in different contexts [24,25,26]. Most fairness metrics consider the simple case of having a privileged group and an unprivileged group in the population. Under this assumption, typically one attribute of the dataset is selected as a sensitive attribute (e.g., gender, race, etc.) that defines the privileged and the unprivileged groups. The goal of fairness in machine learning is to ensure fair and non-discriminated results regardless of the membership in a sensitive attribute. The two main notions of fairness considered by the community are individual fairness and group fairness: Individual fairness [27] claims that similar individuals should be treated similarly, and group fairness requires that different demographic subgroups should receive equal treatment with respect to their sensitive attributes. While both notions of fairness are important, this work focus on group fairness because our goal is to analyze and mitigate the potential bias against certain groups (e.g. demographic groups) through personalization techniques. The following metrics are considered for evaluating group fairness as a part of this work.

In the rest of the paper, \(\hat{Y}=1\), \(\hat{Y}=0\) is used to represent the positive and negative prediction respectively, and \(S=1\), \(S=0\) to represent the privileged and unprivileged group.

The simplest notion of fairness to be proposed was demographic parity [27].

Definition 2.1

Demographic parity is achieved by a system when the prediction \(\hat{Y}\) of the target label Y is statistically independent of the sensitive attributes S, i.e.,

$$\begin{aligned} \mathbb {P}\left[ \hat{Y}=1| S=1\right] =\mathbb {P}\left[ \hat{Y}=1| S=0\right] \end{aligned}$$
(6)

Imposing demographic parity has often a strong negative impact on accuracy, and, consequently, more refined notions were proposed afterwards. In particular, equalized odds and equal opportunity [28].

Definition 2.2

A system satisfies equalized odds if its prediction \(\hat{Y}\) is conditionally independent of the sensitive attribute S given the target label Y,

$$\begin{aligned} \mathbb {P}\left[ \hat{Y}=1|Y=y, S=1\right] =\mathbb {P}\left[ \hat{Y}=1|Y=y, S=0\right] , \quad y \in \{0, 1\} \end{aligned}$$
(7)

In other words, the notion of equalized odds requires the privileged and unprivileged groups to have equal true positive rates and equal false positive rates.

Equal opportunity is a relaxation of equalized odds, in the sense that it only requires equal true positive rates across the groups.

Definition 2.3

Equal opportunity is satisfied by a system if its prediction \(\hat{Y}\) is conditionally independent of the sensitive attribute S given the target label Y

$$\begin{aligned} \mathbb {P}\left[ \hat{Y}=1|Y=1, S=1\right] =\mathbb {P}\left[ \hat{Y}=1|Y=1, S=0\right] \end{aligned}$$
(8)

In practice, however, it is difficult to obtain perfect equality for any of the aforementioned notions. Hence, typically the aim is to minimize the absolute value of the difference between the privileged and unprivileged groups, rather than requiring this difference to be exactly zero. For instance, the demographic parity difference is defined as

$$\begin{aligned} \left|\mathbb {P}\left[ \hat{Y}=1| S=1\right] - \mathbb {P}\left[ \hat{Y}=1| S=0\right] \right|\end{aligned}$$
(9)

and similarly for the equalized odd difference and equal opportunity difference.

Related Works

Federated optimization has demonstrated suboptimal performance when the local datasets consist of samples from non-congruent distributions, resulting in the inability to simultaneously minimize both client-level and global objectives. In previous studies [4,5,6], researchers examined various meta-algorithms for personalization, but the assertion of preserving user privacy relies solely on clients releasing updated models or model updates, rather than transferring raw data to the server, which can have significant consequences. To address this issue, several works have focused on the privatization of the (federated) optimization algorithm within the framework of DP [12, 13, 29, 30], which adopt DP to provide defences against an honest-but-curious adversary. However, even in this setting, there is no guarantee of protection against sample reconstruction from the local datasets using client updates, as highlighted in [9]. Various strategies have been explored to offer local privacy guarantees, either through cryptographic approaches [31] or within the framework of local DP [14, 32, 33]. Specifically, in [33], the authors tackle the problem of personalized and locally differentially private federated learning, but only for the case of simple convex, 1-Lipschitz cost functions of the inputs. It is worth noting that this assumption is unrealistic in the majority of machine learning models and excludes many statistical modelling techniques, particularly neural networks. Finally, some research focused on designing architectures capable of providing private computing environments for remote users [34], often making use of trusted platform modules, secure processors [35], or similar mechanisms [36] improving efficiency by enforcing encryption on network transmissions, rather than memory accesses. For example, the latter work conceptualizes an architecture that could be leveraged to deploy a server that can only reveal the data being processed to clients that instantiated the server. It shall be noted, however, that cryptographic guarantees of security are orthogonal to the privacy notions of differential privacy and its generalizations. To summarize and provide context around this work, Table 1 provides a qualitative evaluation of relevant research and how the contributions presented in this paper fit among them.

Table 1 Qualitative comparison with the most relevant prior research on the topic

Of late, a great deal of attention has been devoted to studying and understanding the aspects of fairness in machine learning [23, 37,38,39,40,41,42]. Most of the research on fairness focuses on developing techniques to mitigate bias in machine learning algorithms. These techniques can be categorized into three main approaches: pre-processing, in-processing, and post-processing. Pre-processing techniques [43, 44] aim to generate a less biased dataset by modifying the values or adjusting the sampling process. In the case of in-processing techniques [45, 46], the objective function is optimized while taking into account discrimination-aware regularizers. Post-processing techniques [47, 48] involve adjusting the trained model to produce fairer outcomes. However, it is worth noting that the majority of these studies primarily target centralized machine learning models as opposed to FL. Furthermore, there is a lack of research exploring the interplay between accuracy and fairness [40, 41] or privacy and fairness [23, 49]. In particular, to the best of our knowledge, disproportionately fewer works have focused on investigating the relationship between privacy and fairness. [23] formally proved that privacy and fairness can be at odds with each other with non-trivial accuracy. A few recent works on group fairness in FL have emerged [38, 39] but they do not consider the facet of privacy-fairness trade-off.

An Algorithm for Private and Personalized Federated Learning

Algorithm 1 aims to enable personalized federated learning while ensuring local privacy guarantees to preserve group privacy. In this context, locality refers to the sanitization of client information before it is shared with the server, while group privacy pertains to the notion of indistinguishability within a specific neighbourhood of clients, defined based on a particular distance metric. To clarify our terminology, we provide definitions for neighbourhood and group as follows:

Definition 4.1

For any model parameterized by \(\theta _0,\in ,\mathbb {R}^n\), the r-neighbourhood is defined as the set of points in the parameter space that are within an \(L_2\) distance of r or less from \(\theta _0\), i.e., \({\theta \in \mathbb {R}^n :\left\| \theta _0 -\theta \right\| _2\le r}\). Clients whose models are parameterized by \(\theta \in \mathbb {R}^n\) within the same r-neighbourhood are considered to be part of the same group or cluster.

Algorithm 1 is inspired by the Iterative Federated Clustering Algorithm (IFCA) proposed in [4] and extends it by incorporating formal privacy guarantees. The key modifications include the introduction of the SanitizeUpdate function, as described in Algorithm 2, and the utilization of k-means for server-side clustering of the updated models.

Algorithm 1
figure a

An algorithm for personalized federated learning with formal privacy guarantees in local neighbourhoods.

Algorithm 2
figure b

SanitizeUpdate obfuscates a vector θ ∈ Rn, with a Laplacian noise tuned on the radius of a certain neighbourhood and centered in 0.

The Laplace Mechanism Under Euclidean Distance in \(\mathbb {R}^n\)

The SanitizeUpdate function in Algorithm 2 is based on a generalization of the Laplace mechanism to \(\mathbb {R}^n\) under the Euclidean distance, which was originally introduced in [50] for geo-indistinguishability in \(\mathbb {R}^2\). The decision to utilize the \(L_2\) norm as the distance measure serves two main purposes.

First, clustering is performed on the vector space \(\mathbb {R}^n\) of parameters, using the k-means algorithm, which relies on the Euclidean distance. By defining clusters or groups of users based on the proximity of their model parameters using the \(L_2\) norm, the procedure needs a d-privacy mechanism that obscures the reported values within each group while enabling the server to distinguish among users belonging to different clusters.

Second, the use of equidistant noise vectors in the \(L_2\) norm for sanitizing the parameters ensures equiprobability by construction. This property leads to the same bound on the increase of the cost function in first-order approximation, as demonstrated in Proposition 4.2. The Laplace mechanism under Euclidean distance in the general space \(\mathbb {R}^n\) is formally defined in Proposition 4.1.

Proposition 4.1

Let \(\mathcal {L}_{\varepsilon }:\mathbb {R}^n \mapsto \mathbb {R}^n\) be the Laplace mechanism with distribution \(\mathcal {L}_{x_0,\varepsilon }(x) = \mathbb {P}\left[ \mathcal {L}_{\varepsilon }(x_0)=x\right] = Ke^{-\varepsilon d(x,x_0)}\) with d(.) being the Euclidean distance. If \(\rho \sim \mathcal {L}_{x_0, \varepsilon }(x)\), then:

  1. 1.

    \(\mathcal {L}_{x_0,\varepsilon }\) is \(\varepsilon\)-d-private and \(K =\frac{\varepsilon ^n\Gamma (\frac{n}{2})}{2\pi ^{\frac{n}{2}}\Gamma (n)}\)

  2. 2.

    \(\left\| \rho \right\| _2 \sim \gamma _{\varepsilon ,n}(r) = \frac{\varepsilon ^n e^{-\varepsilon r} r^{n-1}}{\Gamma (n)}\)

  3. 3.

    The \(i^{th}\) component of \(\rho\) has variance \(\sigma _{\rho _i}^2 = \frac{n+1}{\varepsilon ^2}\)


where \(\Gamma (n)\) is the Gamma function defined for positive reals as \(\int _{0}^{\infty }t^{n-1}e^{-t}\, dt\) which reduces to the factorial function whenever \(n \in \mathbb {N}\).

Proof

The proof can be found in Appendix A of [20]. \(\square\)

Proposition 4.2

Let \(y = f(x,\theta )\) be the fitting function of a machine learning model parameterized by \(\theta\), and \((X,Y) = Z\) the dataset over which the RMSE loss function \(F(Z,\theta )\) is to be minimized, with \(x\in X\) and \(y \in Y\). If \(\rho \sim \mathcal {L}_{0, \varepsilon }\), the bound on the increase of the cost function does not depend on the direction of \(\rho\), in first-order approximation, and:

$$\begin{aligned} \begin{aligned}&\left\| F(Z,\theta + \rho ) \right\| _2 - \left\| F(Z,\theta ) \right\| _2 \le \\& \quad \left\| J_f(X,\theta ) \right\| _2 \left\| \rho \right\| _2 + o( \left\| J_f(X,\theta )\cdot \rho \right\| _2) \end{aligned} \end{aligned}$$
(10)

Proof

The proof can be found in Appendix A of [20]. \(\square\)

The results in Proposition 4.1 allow to reduce the problem of sampling a point from Laplace to (i) sampling the norm of such point according to the result in Item 4.1 of Proposition 4.1 and then (ii) sample uniformly a unit (directional) vector from the hypersphere in \(\mathbb {R}^n\). Much like DP, d-privacy provides a means to compute the total privacy parameters in case of repeated queries, a result known as the Compositionality Theorem for d-privacy.

Theorem 4.1

Let \(\mathcal {K}_i\) be \((\varepsilon _i)\)-d-private mechanism for \(i\in \{1,2\}\). Then their independent composition is \((\varepsilon _1+\varepsilon _2)\)-d-private.

Proof

The proof can be found in Appendix A of [20]. \(\square\)

A Heuristic for Defining the Neighbourhood of a Client

During the t-th iteration, when a user c invokes the SanitizeUpdate procedure in Algorithm 2, it has already received a set of hypotheses, optimized \(\theta _{\bar{j}}^{(t)}\) (the one that fits best its data distribution), and got \(\theta _{\bar{j}, c}^{(t)}\). It is reasonable to assume that clients whose datasets are sampled from the same underlying data distribution \(\mathcal {D}_{\bar{j}}\) will perform an update similar to \(\delta _c^{(t)}\). Therefore, points which are within the \(\delta _c^{(t)}\)-neighbourhood of \(\hat{\theta }_{\bar{j}, c}^{(t)}\) are forced to be indistinguishable. To provide this guarantee, the Laplace mechanism is tuned such that the points within the neighbourhood are \(\varepsilon \Vert \delta _c^{(t)}\Vert _2\) differentially private. By choosing \(\varepsilon = n/(\nu \delta _c^{(t)})\), it results in \(\varepsilon \Vert \delta _c^{(t)}\Vert _2 = n/\nu\), where \(\nu\) is referred to as the noise multiplier. Notably, a larger value of \(\nu\) corresponds to a stronger privacy guarantee. This is because the norm of the noise vector sampled from the Laplace distribution follows the distribution specified in Proposition 4.1, with an expected value of \(\mathbb {E}\left[ \gamma _{\varepsilon , n}(r)\right] = n/\varepsilon\).

Experiments

The following Section discusses a number of experimental validations of Algorithm 1 on different tasks and datasets. Detailed experimental settings are discussed in Appendix B of [20], but we provide here an overview of the hardware and software stacks: All the following experiments are run on a local server running Ubuntu 20.04.3 LTS with an AMD EPYC 7282 16-Core processor, 1.5TB of RAM and \(8\times\) NVIDIA A100 GPUs. Python and PyTorch are the main software tools adopted for simulating the federation of clients and their corresponding collaborative training.

Characterizing Privacy

In this Section, we aim to evaluate and assess the trade-off in training personalized federated learning models under formal local privacy guarantees.

Synthetic Data

Data is generated according to \(k=2\) different distributions: \(y = x^T\theta _i^* + u\) and \(u \sim \text {Uniform}\left[ 0, 1\right)\), \(\forall i\in \{1,2\}\) and \(\theta _1^* = \left[ +5, +6\right] ^T\), \(\theta _2^* = \left[ +4, -4.5\right] ^T\). We then assess how training progresses as we move from the Federated Averaging [51] (Fig. 1a–c), to IFCA (Fig. 1d–f), and finally Algorithm 1 (Fig. 1–i). When utilizing Federated Averaging, a noticeable issue arises: relying on a single hypothesis fails to capture the diversity present in the data distributions. As a result, the final parameters tend to settle somewhere between the optimal parameter values (see Fig. 1b). Conversely, employing IFCA demonstrates that having multiple initial hypotheses enhances performance, particularly when clients possess heterogeneous data. This is evident from the nearly overlapping optimized client parameters with the true optimal parameters (see Fig. 1e).

By adopting our algorithm instead, not only do we provide formal guarantees, but we also achieve remarkable outcomes in terms of proximity to the optimal parameters (see Fig. 1h) and reduction of the loss function (see Fig. 1i). To assess privacy infringement, Fig. 2 illustrates the maximum level of privacy leakage incurred by clients per cluster.

Fig. 1
figure 1

(From [20]) Learning federated linear models with: (ac) one initial hypothesis and non-sanitized communication, (df) two initial hypotheses and non-sanitized communication, (gi) two initial hypotheses and sanitized communication. The first two figures of each row show the parameter vectors released by the clients to the server

Fig. 2
figure 2

(From [20]) Synthetic data: max privacy leakage among clients. Privacy leakage is constant when clients with the largest privacy leakage are not sampled (by chance) to participate in those rounds

Hospital Charge Data

This experiment utilizes the Hospital Charge Dataset obtained from the Centers for Medicare and Medicaid Services of the US Government [52]. Here, the healthcare providers are regarded as the clients who participate in training a machine learning model through federated learning. The objective is to predict the cost of a medical service based on its location in the country and the specific procedure involved.

To evaluate the trade-off between privacy, personalization, and accuracy, we explore various numbers of initial hypotheses since the number of underlying data distributions is unknown a priori. Accuracy is assessed at different levels of the noise multiplier \(\nu\). Notably, using Algorithm 1 with only one hypothesis yields the Federated Averaging algorithm. As depicted in Fig. 3, employing multiple hypotheses significantly reduces the RMSE loss function, particularly when transitioning from one to three hypotheses. Furthermore, we emphasize that increasing the number of hypotheses also helps mitigate the impact of the noise multiplier, even at high levels (as shown on the right-hand side of the figure). This highlights the importance of adopting formal privacy guarantees when a slight increase in the cost function is acceptable. The empirical distribution of privacy leakage among clients involved in a specific training configuration is illustrated in Fig. 4. Table 2 presents privacy leakage statistics across multiple rounds and configurations.

Fig. 3
figure 3

(From [20]) RMSE for models trained with Algorithm 1 on the Hospital Charge Dataset. Error bars show \(\pm \sigma\), with \(\sigma\) the empirical standard deviation. Lower RMSE values are better for accuracy

Fig. 4
figure 4

(From [20]) Hospital charge data: the empirical distribution of the privacy budget over the clients for \(\nu =3\), 5 initial hypotheses, seed \(=3\), r is the radius of the neighbourhood, the total number of clients is 2062

Table 2 (From [20]) Hospital charge data: median and maximum local privacy budgets over the whole set of clients, averaged over 10 runs with different seeds

FEMNIST Image Classification

 

Table 3 (From [20]) Effects of increasing the noise multiplier on the validation accuracy and standard deviation
Fig. 5
figure 5

(From [20]) Effects of the Laplace mechanism in Proposition 4.1 with different noise multipliers as a defence strategy against the DLG attack

This task involves character recognition from images using the FEMNIST dataset [53]. When selecting the range of noise multipliers \(\nu\), the resulting privacy leakage \(\varepsilon \Vert \delta _c^{(t)} \Vert _2 = n/\nu\) would be exceptionally large, given the CNN’s \(n=206590\) parameters. Consequently, this renders the mechanism incapable of providing meaningful theoretical privacy guarantees. This issue is commonly encountered with local privacy mechanisms [54], as the expected value of the noise vector’s norm, \(\mathbb {E}\left[ \gamma _{\varepsilon , n}(r)\right]\), exhibits a linear dependence on n: \(n/\varepsilon\).

However, it is still possible to evaluate, in practice, whether this specific generalization of the Laplace mechanism can effectively defend against a particular attack known as DLG [9]. The outcomes of varying noise multiplier values are presented in Fig. 5, and Table 3 provides additional details. Notably, when \(\nu = 10^{-3}\), the ground truth image can be fully reconstructed. Partial reconstruction remains possible up to \(\nu = 10^{-1}\). However, for \(\nu \ge 1\), experimental results demonstrate the failure of the DLG attack to reconstruct input samples when the communication between the client and server is protected by the mechanism outlined in Proposition 4.1.

Fairness Analysis

In this section, we analyze how group fairness improves with the personalization of the trained models under d-privacy guarantees when there are two groups with different data distributions. Experiments were performed on synthetic data and the FEMNIST image classification dataset that was used in Section “Characterizing Privacy”. To ensure a thorough evaluation, we considered a variety of group fairness metrics in the experiments. In particular, we measured the fairness with respect to equal opportunity [28], equalized odds [28], and demographic parity [27] as explained in Section “Fairness”.

In particular, in Figs. 7 and 8, the X-axis denotes the noise multiplier \(\nu\) representing the amount of d-private noise added to the local updates as explained in Sect. 4.2 and the Y-axis denotes the absolute value of the difference in fairness between the privileged and unprivileged groups with respect to the different metrics of group fairness that we considered.

Synthetic Data

Fig. 6
figure 6

The first two plots from the left illustrate the spatial distribution of the samples in \(g_1\) and \(g_2\), respectively, and the third plot shows \(g_1\) and \(g_2\) superimposed together in the same space

Synthetic data was generated in a method similar to that in Section “Synthetic Data” with the following modifications to enable us to investigate the aspect of group fairness fostered by our method: i) Total number of users is 1000 and each user holds 10 samples. 800 users have data that is generated according to distributions \(y = x^T\theta _1 + u\) and \(u \sim \text {Uniform}\left[ 0, 1\right)\), \(\forall i\in \{1,2\}\), and set as a privileged majority group \(g_1\). The remaining 200 users have data that is generated according to distribution \(y = x^T\theta _2 +15 + u\) and \(u \sim \text {Uniform}\left[ 0, 1\right)\), \(\forall i\in \{1,2\}\), and set as an unprivileged minority group \(g_2\). In this case, the sensitive attribute considered to evaluate fairness is the group id G where \(G\in \{g_1, g_2\}\). ii) For binary classification, we set labels by using the \(z={\text {Sigmoid}}(Y), \,\forall \, y, \hat{y} \in Y\). In the case of \(g_1\), we assign the label 1 if the value of z is greater than or equal to 0.5 and assign the label 0 otherwise. On the other hand, in the case of \(g_2\), the label 1 is assigned when the \(z={\text {Sigmoid}}(Y-15), \,\forall \, y, \hat{y} \in Y\) is less than or equal to 0.5, and the label 0 is assigned otherwise. This setting is to simulate a situation in which discrimination occurs depending on sensitive attributes in the real world such as minorities would have experienced a higher loan rejection rate than white applicants with the same property [55]. Thus, in our experiment, label 1 could be interpreted as “loan approved” and label 0 as “loan denied”. The data generated in this way are shown in Fig. 6.

We compared the fairness for two cases: one with a single hypothesis (no personalization) and the other with the number of hypotheses as 2 (with personalization) in the framework of Algorithm 1. The experimental results are demonstrated in Fig. 7.

Fig. 7
figure 7

The figure shows the comparison between the personalized and non-personalized models for (from left) equal opportunity, equalized odds, and demographic parity, respectively. Experiments were performed for noise multipliers \(\nu\) of 0.1, 1, 2, and 4. For all the metrics of fairness and the values of the noise multiplier, the personalized model is seen to possess improved fairness over the non-personalized model

The results illustrated by Fig. 7 assert that the personalization of models (i.e., Algorithm 1) enhances the group fairness under all the metrics and the levels of formal privacy guarantees compared to that of the non-personalized model. A major reason behind this significant improvement of fairness by the personalized model is that unlike the non-personalized model, which trains using data from both groups that are biased towards the majority group \(g_1\), the personalized model training optimizes for each group’s data distribution without disregarding the effect of the minority group \(g_2\). We also observe that fairness deteriorates as the value of the noise multiplier increases, as we would expect. This is presumably due to the decreasing influence of the minority group \(g_2\) as the amount of noise insertion increases. This is consistent with the philosophy behind and the definition of DP and its variants. Furthermore, interestingly we observe that the personalized model ensures better fairness than the non-personalized model even with the highest level of privacy protection. This shows that personalization in FL under d-privacy can be a comprehensive solution towards privacy-preserving and ethical machine learning as it provides both privacy guarantees and enhanced fairness.

 

Fig. 8
figure 8

The figure shows the comparison between the personalized and non-personalized models for equal opportunity equalized odds, and demographic parity. Experiments were performed for noise multipliers \(\nu\) of 0.1, 1, 2, and 4. For all metrics of fairness and values of the noise multiplier, the personalized model improved fairness over the non-personalized model

FEMNIST Image Classification


To evaluate the fairness of our method on real datasets, we considered FEMNIST image classification dataset in the same form as in Section “FEMNIST Image Classification”. As in experiments performed with the synthetic data in Section “Synthetic data”, the size of the groups considered privileged and unprivileged were different denoting the existence of a majority and a minority in the population. In this part, the rotated images are set as the unprivileged group \(g_2\) with a total number of sampled users of 382 forming only \(20\%\) of all users. and the un-rotated images are used to represent the privileged group \(g_1\) with a total number of users of 1736. Like in the case of synthetic data considered before, the group membership was used to denote the sensitive attribute. In the case of \(g_1\), we assign label 1 if the FEMNIST image label is even and 0 if it is odd. And for the \(g_2\), we assign label 0 if the FEMNIST image label is even and assign 1 if it is odd. The experimental results are given by Fig. 8.

We observe that the personalized model training harbours significantly better group fairness across all metrics compared to its non-personalized counterpart. The change in fairness due to the amount of noise added was not as notable as in the case of the synthetic dataset but it was still observed to deteriorate with an increase in the value of the noise multiplier. Personalized model training in FL under the highest level of privacy is still observed to have better fairness across all the metrics than (non-personalized) models trained in a classical FL framework even with no privacy, similar to what we observed in the experiments with the synthetic data.

Conclusion

This work builds upon our previous research on personalized federated learning with metric privacy guarantees. To ensure the privacy of ML model parameters during transmission, we employ d-privacy techniques for sanitization. The objective of this process is to generate personalized models that converge to optimal parameters, catering to the diverse datasets present in the federated learning setting. Given the presence of multiple, unknown data distributions among the individuals participating in the federated learning process, we make a reasonable assumption of a mixture of these distributions. To effectively aggregate clients with similar data distributions, we employ a clustering approach using k-means on the sanitized parameter vectors. This method proves suitable because d-private mechanisms preserve the underlying topology of the true value domain. Notably, our mechanism shows particular promise for machine learning models with a relatively small number of parameters. Although the formal privacy guarantees diminish with larger models, experimental results demonstrate the effectiveness of the Laplace mechanism against the DLG attack.

In addition to metric privacy guarantees, we also evaluate the fairness of machine learning models trained using personalized federated learning and d-privacy. Our study assesses various group fairness metrics, including equal opportunity, equalized odds, and demographic parity. The consistent findings demonstrate that personalized models significantly improve group fairness across all evaluated metrics and privacy levels. Moreover, they, unlike non-personalized models, optimize for each group’s specific data distribution, effectively mitigating biases towards the majority group. Consequently, significant advancements in fairness are achieved through this approach.

The level of fairness is influenced by the incorporation of d-private noise in the local updates. As the noise increases, the influence of the minority group decreases, resulting in a deterioration of fairness. This behaviour aligns with the principles of differential privacy and the expected impact of noise addition on group fairness. Remarkably, even with the highest level of privacy protection, personalized models consistently maintain superior fairness compared to non-personalized models. This observation highlights the potential of personalized model training in federated learning under d-privacy as a comprehensive solution for privacy-preserving and ethical machine learning. By offering privacy guarantees alongside enhanced fairness, personalized models demonstrate their effectiveness in balancing these critical aspects.