1 Introduction

Data-driven machine learning algorithms have recently seen unprecedented success in the fields of computer vision, natural language processing, search engine and recommendation system, etc. The related applications involve many industries such as Internet, security, medical, transportation and finance, which strongly contribute to the society development. However, in some scenarios, e.g., offender risk assessment, salary prediction, or loan approval, machine learning models for automated decision-making may be socially biased against groups with specific genders or races. The reason for this social bias is that the training data used in machine learning models are sampled from real-life or synthetic data (simulated real samples). They variously imply people’s preferences or biases, and the evaluation metrics used in the training process will make the models amplify these preferences or biases [1]. As a result, trained models from these biased datasets will have discriminatory behavior, giving different predictive tendencies to people with different sensitive attributes (e.g., gender, race, and age) [2]. In addition, discrimination can be direct or indirect. The latter consists of two aspects, namely proxy discrimination and disparate impact [3]. The proxy discrimination resulting from the correlation between sensitive attributes and other attributes, and disparate impact refers to the correlation between the sensitive attributes and the output. Therefore, removing sensitive attributes can only eliminate direct discrimination, but cannot solve the unfairness caused by indirect discrimination.

Currently, a good deal of fair machine learning models prevent discrimination by modifying training data to eliminate biases associated with sensitive attributes [4, 5]. In particular, adversarial training [6] mitigates biases by learning new representations from which target variables can be predicted, but not sensitive attributes. In addition, as the Generative Adversarial Network (GAN) [7] has shown satisfactory results in generating high-quality synthetic data (that is similar to real data [8, 9]), many researches have attempted to construct fair datasets by combining the adversarial training and GAN [10,11,12,13,14,15]. Learning fair classifiers with fair synthetic data has a significant advantage, since obtaining real datasets of good quality and fairness is extremely resource intensive in most learning scenarios.

However, for synthetic data, the fairer they are, the worse the data utility. Therefore, the balance between data fairness and utility needs to be considered in model training. In addition, generating fair data can eliminate proxy discrimination well to achieve statistical fairness, but the disparate impact involves outputs of downstream tasks can be further mitigated. Based on the above analysis, a information-minimizing generative adversarial network for fair generation and classification (FACGAN) is proposed. The contributions of this work are presented as follows:

(1) The model is conducted by joint adversarial learning with the generator and the classifier to better eliminate indirect discrimination. The generator is trained to minimize the mutual information between non-sensitive and sensitive attributes, while the classifier minimizes the mutual information between predicted labels and sensitive attributes.

(2) The latent factor based on ANOVA is constructed to reduce the accuracy loss due to the introduction of fairness constraints: sensitive attributes are removed from the model input, while non-sensitive which are helpful for label prediction and little impact on the prediction of sensitive information, are selected to construct the latent factor.

(3) The experimental results show that our approach significantly improves the classification accuracy of fair data for downstream tasks and further improves the ability of models trained from fair datasets to perform fair classification.

2 Related Work

Achieving fairness in learning models is currently an imperative task in machine learning. Constructing fair datasets and training fair predictive models are two of the more prominent approaches to achieve fair machine learning. The former eliminates unfairness by detecting the bias or discrimination in the training data and pre-modifying or re-representing the data in the pre-processing. The latter is the adjustment, correction and refinement of the prediction model or algorithm during the in-processing.

2.1 Fair Dataset Construction

For Constructing fair datasets, modifying the training data is the intuitive method. For example, correcting the class labels of certain individuals in the data, reweighting and assigning weights to balance the data, or changing the sample sizes of different subgroups to eliminate biases [4, 5]. These methods overcome or mitigate the discrimination in the training data to some extent, but can only pre-process the original training data and do not have the generalization ability to process the unknown data. In contrast, the idea of combining the adversarial training and GAN to generate unbiased representations has been addressed in several works, which does not require modifying the training data and can maintain data integrity. Xu et al. [10] proposed a model called FairGAN. The model adds an extra discriminator to the original GAN for the purpose of training the generator to generate fair data independent of sensitive attributes to achieve fair classification. Kairouz et al. [11] introduced a data-driven framework for learning fair universal representation, which can guarantee the statistical fairness of any unknown learning task. The adversarial learning is used to generate fair representations with universal properties, where sensitive features have been actively decoupled. Xu et al. [12] presented a causal fairness-aware GAN (CFGAN) to generate high-quality fair data. The CFGAN consists of two generators and two discriminators. Specifically, two generators aim to simulate the original causal and intervention models, while two discriminators are employed to achieve high data utility and causal fairness. Ngxande et al. [13] established a generative adversarial network to generate image data of blacks for alleviating the imbalance of the dataset and improving the accuracy of the driver fatigue detection model. However, it did not consider the resource equity issue. Sattigeri et al. [14] proposed a new auxiliary classifier based on GAN to generate high-dimensional image data, which can enhance the model training and make the model learning more comprehensive and fair. Adeli et al. [15] removed sensitive attributes based on a domain adversarial network. The Pearson correlation coefficient was used to measure the degree of association between attributes, and it was used for continuously ordered variables for measuring the statistical dependence between variables.

However, the introduction of the adversarial training is equal to perturbing the generated data of the original GAN, which will reduce the generation quality of the generator to some extent. In addition, constructing a fair dataset cannot fully guarantee fair classification for models trained from them. Some classification-based fairness notions require constraining the outputs of downstream tasks and are more suitable for applying to prediction models.

2.2 Fair Prediction Model Construction

For the training of fair prediction models, some works suggested mitigating the bias in model prediction by adjusting the learning process, introducing constraints or regularization terms when designing the objective function [17, 18]. Currently, the adversarial training is used to remove information about sensitive attributes from the intermediate representation of the model to obtain a fair classifier. Xu et al. [16] proposed an improved model FairGAN\(^+\). By introducing an additional pair of classifier and discriminator, it is possible to train the classifier to achieve better fair classification. However, it does not remove sensitive attributes at the model input and fail to focus on the utility of the generated data, which will reduce the effectiveness of de-biasing training and cause the larger accuracy loss for downstream tasks. Beutel et al. [19] removed sensitive information of attributes extracted from the potential representations learned by the neural network using an adversarial training process to achieve fair classification. Zhang et al. [20] proposed a method for training unbiased, general and powerful machine learning models. The method enables multiple fairness constraints regardless of the complexity of the prediction model, or the properties of the described prediction and protection variables (discrete or continuous). Abusitta et al. [21] first analyzed which class of people in the dataset that the classifier was biased against, then extracted the features of that class as a condition, and used the GAN to generate synthetic data for the corresponding class of people to supplement the original dataset and obtain fair results of classification. Delobelle et al. [22] proposed model is trained using gradient inversion to prevent the decision results from predicting values of protected attributes while reducing the utility loss. New adversarial samples are generated using escape attacks from adversarial learning to retrain and improve the model to solve the problem caused by gradient inversion. Dhar et al. [23] proposed an adversarial gender de-biasing algorithm (AGENDA) to reduce gender information in face descriptors from previously trained face recognition networks, so as to reduce the gender predictability of face descriptors and mitigate gender bias in face verification, while maintaining reasonable recognition performance. Han et al. [24] introduced an improved IIoV framework (FIIoV) to address the issue that traditional fatigue detection models can be biased for certain groups and lead to inequitable resource allocation. The model detects driver fatigue using a convolutional neural network (CNN), and then adopts an adversarial network to achieve fairness.

Available studies usually train fair generation and prediction models separately. In fact, joint adversarial training of the generator and classifier can better improve the performance of each other. Therefore, the FACGAN combines the adversarial training of the generation and prediction models to generate fair representations and achieve fair classification. Furthermore, unlike existing methods, we pay additional attention to the accuracy loss due to the introduction of fairness constraints, i.e., the FACGAN selects some non-sensitive attributes (that are helpful for label prediction and little association with sensitive information) to replace some noise dimensions of the model input, which can mitigate the accuracy loss and better achieve a balance between fairness and utility.

3 Preliminary

3.1 Generative Adversarial Network and Variants

3.1.1 GAN Model

The Generative Adversarial Network (GAN) [7] is an architecture for training generative models through adversarial networks. It consists of two parts: a generator and a discriminator, both of which are composed of multilayer neural networks, respectively. Given a random noise variable \(z \sim P(z)\) as input, the generator G(z) tries to learn the generative distribution \(P_{g}\) to match the real data distribution \(P_{real}(x)\). Meanwhile, the discriminator D is a binary classifier for predicting whether the data samples are from the real data or the fake data generated by the generator. The objective loss function of D is:

$$\begin{aligned} \begin{aligned} L_D = -E_{x \sim P_{real}(x)}[log D(x)] - E_{z \sim P(z)}[log(1-D(G(z)))] \end{aligned} \end{aligned}$$
(1)

where \(D(\cdot )\) outputs the probability that \(\cdot \) is from the real data rather than the fake data. For the generator G(z), the objective loss function is:

$$\begin{aligned} \begin{aligned} L_G = E_{z \sim P(z)}[log(1-D(G(z)))] \end{aligned} \end{aligned}$$
(2)

Minimization of Eq. (2) ensures that the discriminator is fooled by G(z) and D predicts high probability that G(z) is real data. Thus, the GAN can be formalized as a minimax game with the value function:

$$\begin{aligned} \begin{aligned} min_G max_D V(G, D) = E_{x \sim P_{real}(x)}[log D(x)] + E_{z \sim P(z)}[log(1-D(G(z)))] \end{aligned} \end{aligned}$$
(3)

3.1.2 ACGAN Model

The Auxiliary Classifier Generative Adversarial Network (ACGAN) [25] is a variant of GAN. Each synthetic sample has a correlated class label y added to the noise z. The discriminator D in the model needs to determine the truth of the input data and also classify the data with labels achieved by adding an auxiliary classifier. This auxiliary classifier shares network structure and parameters with the D. The generator G can generate samples that satisfy the real data distribution and class label conditions. The objective function of the model consists of two parts: judging the true or false of the input data and correctly classifying them. For D, the objective loss function is:

$$\begin{aligned} \begin{aligned} L_D =&-E[log P(S = real | X_{real})] - E[log P(S = fake | X_{fake})] \\&-E[log P(C = \dot{y} | X_{real})] - E[log P(C = \dot{y} | X_{fake})] \end{aligned} \end{aligned}$$
(4)

For G, it is:

$$\begin{aligned} L_G = E[log P(S = fake | X_{fake})] - E[log P(C = \dot{y} | X_{fake})] \end{aligned}$$
(5)

3.1.3 InfoGAN Model

Since the input z of the generator G in GAN is a continuous noise signal and without any constraints, it leads to poor interpretability as the GAN cannot map the specific dimension of z to the semantic features of the output. In contrast, the Information-maximizing Generative Adversarial Network (InfoGAN) [26] splits z into an incompressible noise z and an interpretable latent factor c. By constraining the relationship between c and the synthetic data, it is possible to make c contain interpretable information about the data. The objective function of the model is:

$$\begin{aligned} min_G max_D V_I(G, D) = V(G, D) - \lambda I(c, G(z, c)) \end{aligned}$$
(6)

where, I(cG(zc)) represents the mutual information between c and the synthetic data. The larger the value of I(cG(zc)), the higher correlation between c and the synthetic data. The goal of the adversary in the model is to infer the value of the real latent factor from synthetic data. In contrast, the proposed approach aims to minimize the ability of the adversary to infer sensitive information from synthetic data and predictive labels. The goal is to better solve indirect discrimination by proxy and disparate impact.

3.2 Fairness Definitions

In the fairness studies of machine learning, there exists fairness concepts based on data and classification. For a labeled dataset D(XYS), it contains a set of non-sensitive attributes \(X \in R^n\), class labels \(Y \in \{0,1\}\) and sensitive attributes \(S \in \{0,1\}\). For the sake of discussion, we take S and Y as binary variables. Then we introduce the fairness definition criterion used by the proposed model.

3.2.1 Definition of the Fair Dataset

The \(\epsilon \)-Fairness quantifies the data fairness by predicting the error rate of the sensitive attribute S from the given non-sensitive attribute X. A lower error rate means that S can be predicted by X. For classifiers trained on synthetic datasets and tested on real datasets, classification fairness is achieved if the disparate impact of real sensitive attributes is removed from the synthetic dataset. Here, we formally define \(\epsilon \)-Fairness.

Definition 1

(\(\epsilon \)-Fairness) [27] There exists a data distribution D(XYS) that satisfies \(\epsilon \)-Fairness if \(BER(C(X), S) > \epsilon \) for any classifier \(C: X \rightarrow S\). The data distribution is considered to satisfy \(\epsilon \)-Fairness. The \(\epsilon \)-Fairness is used to measure the potential discrimination caused by the correlation between the non-sensitive attribute X and the sensitive attribute S. Where, BER is the average class conditional error on the data distribution D(XYS). It can be expressed as equation (7).

$$\begin{aligned} \begin{aligned} BER(C(X, S)) = [P(C(X) = 0|S = 1) + P(C(X) = 1|S = 0)] / 2 \end{aligned} \end{aligned}$$
(7)

3.2.2 Definition of the Fair Classification

The commonly used measures of classification fairness are Demographic Parity and Equalized Odds (and their variants). Among them, Demographic Parity ensures complete independence between the prediction target Y and sensitive attributes S to achieve group fairness. The difference between Equalized Odds and Demographic Parity is that the real label Y of the sample in the training set is used as a priori knowledge. Next, we formally define Demographic Parity and Equalized Odds.

Definition 2

(Demographic Parity) [28] The classifier C trained based on the data distribution D(XYS) can be considered as satisfying Demographic Parity if the classification result C(X) can be independent of the sensitive attribute S, i.e., \(P(C(X) =1| S=1) = P(C(X) =1| S=0)\), which can be formulated as inequality (8). Where, the threshold \(\tau \) is used as a fairness constraint.

$$\begin{aligned} |P(C(X) = 1 | S = 1) - P(C(X) = 1 | S = 0)| \le \tau \end{aligned}$$
(8)

Definition 3

(Equalized Odds) [28] The classifier C trained based on the data distribution D(XYS) can be considered as satisfying Equalized Odds if the classification result C(X) is conditionally independent of the sensitive attribute S. Given the label Y, i.e., \(P(C(X) =1| Y=y, S=1) = P(C(X) =1| Y=y, S=0)\), which can be formulated as inequality (8). Where, the \(\tau \) threshold is used as a fairness constraint.

$$\begin{aligned} \begin{aligned} |P(C(X)= 1|Y=y, S=1)-P(C(X)= 1|Y = y, S=0)| \le \tau ,y \in \{0, 1\} \end{aligned} \end{aligned}$$
(9)

4 Design of FACGAN

4.1 Model Structure

Fig. 1
figure 1

The structure of FACGAN model

Motivated by the ACGAN and InfoGAN, the FACGAN is proposed in this paper, as shown in Fig. 1. It can ensure that no sensitive information can be inferred from the synthetic data and the predicted output, and then achieve better fair classification. The FACGAN is composed of four separate networks: Generator, Discriminator, Classifier and Adversary.

(1) Generator (G): Given random noise z and latent factor c, each synthetic sample G(zc) has a corresponding sensitive attribute \(s \sim P_{real}(s)\) and label \(y \sim P_{real}(y)\), i.e., the synthetic data is \((x_g | y, s) \sim P_g(x | y, s)\). The goal of G is to ensure that the synthetic data \((x_g | y, s)\) is close to the real data and can be correctly classified, but cannot infer the user’s sensitive attribute s from it. Where, the latent factor c consists of non-sensitive attributes that are little impact for prediction of sensitive information. It can ensure good classification accuracy of generated data for downstream tasks.

(2) Discriminator (D): The D mainly distinguishes the real data from \(P_{real}(x | y, s)\) and the synthetic data from \(P_g(x | y, s)\).

(3) Classifier (C): Unlike the network structure in ACGAN where D and C share parameters, the C in FACGAN adopts an independent network, which can optimize C using the loss of C alone. The C performs fair and correct class label classification of both the synthetic data \((x_g | y, s)\) and the real data (x|ys).

(4) Adversary (A): The A infers sensitive information \(s_g\) and \(s_c\) from the synthetic data \((x_g | y, s)\) and the prediction results \(y_c\) of the C to close the real sensitive information s of the user. The former acts as a fairness constraint on data generation: the G adversaries against the A to ensure that the synthetic data satisfies the concept of data fairness. The latter serves as a fairness constraint on classification: the C is confronted with the A to guarantee that the prediction results satisfy the concept of classification fairness.

4.2 Latent Factor Construction

Fig. 2
figure 2

The construction of the latent factor

If the input of the generator consists of random noise only, the freedom of the generated representations is large. In addition, the introduction of fairness constraints will further reduce the data utility, making it difficult to guarantee the prediction accuracy for downstream tasks. Therefore, we select some non-sensitive attributes of the original sample as part of the input. However, since non-sensitive attributes may imply sensitive information due to the correlation between attributes [29], we finally selected non-sensitive attributes that are helpful for label prediction and have low impact on the prediction of sensitive information as the latent factor.

Specifically, we use the ANOVA method [30] for feature selection, which can increase the classification reliability. The ANOVA can test whether the independent variable has a significant effect on the dependent variable by analyzing the error sources to judge whether the means of the different aggregates are equal. The error consists of the inter-group error SSA and the intra-group error SSE, and the F-statistic is used to calculate the ratio of SSA to SSE. The larger the F-value, the larger the difference between the groups, which means the independent variable has an effect on the dependent variable, i.e. there is a high correlation between these two variables.

Moreover, we adopt adversarial training to further remove the correlation of the latent factor with sensitive information. This will reduce the probability that the generated representations encode sensitive information, while ensuring the prediction accuracy of downstream tasks. The process of constructing the latent factor is shown in Fig. 2, which primarily consists of the following steps.

(1) Firstly, the ANOVA method is used to calculate the F-value of each non-sensitive attribute with respect to the label. Then, the n attributes are sorted by F-value from largest to smallest, and t attributes that are helpful for label prediction is selected as the candidate set of attributes for the latent factor according to the preset threshold \(\delta \). The threshold \(\delta \) refers to the accuracy obtained when t attributes are selected for label prediction.

(2) The values of F-value between non-sensitive attributes and sensitive attributes in the candidate set are calculated and ranked from smallest to largest, and the k attributes with low impact on the prediction of sensitive information are selected according to the preset threshold \(\varepsilon \). The threshold \(\varepsilon \) refers to the accuracy obtained when k attributes are selected for sensitive attribute prediction. Finally, the latent factor is obtained.

Here, \(k \le t \le n \) and \(k \le \frac{n}{2}\). The values of the hyperparameters t, k, \(\delta \) and \(\varepsilon \) can be adjusted according to the specific dataset used.

4.3 Model Training

Algorithm 1
figure a

Fair Representation Generation and Classification

The FACGAN is designed and constructed in a collaborative adversarial training between D, A, G, and C. We use binary cross-entropy to represent the adversarial loss. Where, the \(\Phi \) is the network parameter of D; the \(\varphi \) is the network parameter of A; the \(\theta \) is the network parameter of G; the \(\psi \) is the network parameter of C.

For the D, whose objective is to maximize the probability of correctly discriminating the data as true or false, the loss function is:

$$\begin{aligned} L_D(x, x_g, \Phi )= -log(D(x)) - log(1-D(x_g)) \end{aligned}$$
(10)

For the A, whose goal is to maximize the probability of inferring sensitive attributes from the synthetic data and class label predictions, the loss function is:

$$\begin{aligned} L_A(x_g, y_c, \varphi )= \lambda L_{GA}(x_g, \theta , \varphi ) + \mu L_{CA}(y_c, \psi , \varphi ) \end{aligned}$$
(11)

where, \(L_{GA}(x_g, \theta , \varphi )\) and \(L_{CA}(y_c, \psi , \varphi )\) are the adversarial loss functions for the A and G, A and C, respectively. The \(\lambda \) is a hyperparameter to specify the tradeoff between data utility and fairness. The \(\mu \) is a hyperparameter to describe the tradeoff between classification utility and fairness. The loss functions are specified as:

$$\begin{aligned} L_{GA}(x_g, \theta , \varphi ) = -slog(A(x_g)) - (1-s)log(1-A(x_g)) \end{aligned}$$
(12)
$$\begin{aligned} L_{CA}(y_c, \psi , \varphi ) = -slog(A(y_c)) - (1-s)log(1-A(y_c)) \end{aligned}$$
(13)

For the C, the goal is to maximize the probability of correctly predicting class labels of the synthetic and real data, while minimizing the probability of the A inferring sensitive attributes from predicted class labels, with the loss function:

$$\begin{aligned} \begin{aligned} L_{C}(x, y, x_g, y_g, \psi ) = -ylog(C(x)) - y_glog(C(x_g)) - \mu L_{CA}(y_c, \psi , \varphi ) \end{aligned} \end{aligned}$$
(14)

For the G, the goal is to maximize the probability that the synthetic data is discriminated as true and correctly classified, while minimizing the probability that the A infers sensitive attributes from the synthetic data with a loss function of:

$$\begin{aligned} L_{G}(x_g, \theta ) = log(1-D(x_g)) - y_glog(C(x_g)) - \lambda L_{GA}(x_g, \theta , \varphi ) \end{aligned}$$
(15)

In the cooperative adversarial training between the discriminator D, generator G and classifier C, improving the utility of G can improve the utility of C in making correct predictions, and improving C can improve the performance of G in generating more realistic samples for each class label. Moreover, in adversarial training between the adversary A, generator G, and classifier C, improving the fairness of G can improve the ability of C to classify fairly, and improving the fairness of C can improve the ability of G to generate fair data. Therefore, by training both the generative model and the classifier, the FACGAN will perform better than either the independent generative model or the independent classifier. The specific model training process is shown in Algorithm 1.

5 Experimental Analysis

The experiment consists of two parts: (1) Testing the fairness of the synthetic dataset; (2) Testing the fairness of the classifier.

To verify the quality of the fair dataset generated by the FACGAN model,Footnote 1 we measure the utility of the synthetic dataset by calculating the Euclidean distance between the synthetic dataset and the real dataset, and additionally use the predictive accuracy to measure the effectiveness of the synthetic dataset for downstream tasks. Moreover, we adopt \(\epsilon \)-Fairness as a metric for the fairness of the synthetic dataset.

To check the classification effectiveness of the FACGAN model, we use the prediction accuracy Accuracy and fairness criteria Demographic Parity, Equalized Odds as metrics for the trained model on real datasets.

Since existing GAN-based models focus on different aspects, we choose to compare the different effects of the three baseline methods: (a) ACGAN [25] without considering fairness, and we compare it on data quality; (b) FairGAN [10] can generate fair data, and we compare it on fair data generation; (c) FairGAN\(^+\) [16] without input preprocessing, and we compare it on fair data generation and classification.

In addition, the classifier trained on the unprocessed dataset is used as Baseline. All classifiers built in ACGAN, FairGAN\(^+\) and FACGAN are logistic regression model.

5.1 Dataset Processing

We use the UCI Adult Income [31] and ProPublica COMPAS [32] datasets to verify the validity of the proposed model. According to the statistics, both datasets have serious gender and race discrimination, respectively.

UCI Adult Income This dataset contains basic information about the US residents at that time and the corresponding annual income, which can be used to predict whether a person earns more than 50K per year. 48,842 (45,222 after deleting records containing “?") Samples are included in the dataset, and each sample consists of 15 attributes, with the sensitive attribute “gender", where male is the protected group and female is the unprotected group. The positive class label is someone’s annual salary income over 50K.

ProPublica COMPAS This dataset statistics the results of the intelligent AI software–COMPAS for predicting whether more than 10,000 offenders in Florida will re-offend within 2 years. The COMPAS annotates the defendants by asking them 137 questions. This dataset contains 11,757 (6150 after deleting records containing null or abnormal values) samples, each described by 52 (12 after data preprocessing) attributes. The sensitive attribute is “race", classified as white (specifically Caucasian) or non-white (including black, African-American, Hispanic, Latino, Asian, and other races; only African-Americans were considered in our experiment), where white is the protected group and non-white is the unprotected group. The positive class is labeled as someone who would not re-offend within 2 years.

Fig. 3
figure 3

F-values of non-sensitive attributes

Fig. 4
figure 4

Selection of t and k

5.1.1 Examples of Latent Factor Construction

According to the requirement of the proposed model, non-sensitive attributes that are helpful for class label prediction and have low impact on the prediction of sensitive information are selected to construct the latent factor in data preprocessing. Therefore, we first calculate the F-value of each non-sensitive attribute with the label and sensitive attribute by using ANOVA method [30], and rank attributes by the magnitude of F-value.

For the Adult Income dataset, the F-values of n (where \(n=13\)) non-sensitive attributes relative to the label Income and the sensitive attribute Gender, respectively, are shown in Fig. 3a. First, we sort the attributes from largest to smallest according to the F-value, and then calculate the accuracy when only the best t attributes (here the best refers to the largest F-value) are retained in turn, and the results are shown on the left of Fig. 4a. Here, t takes values in the range of \(6 \le t \le 13 \). To ensure high accuracy, we set the threshold \(\delta =0.87\) and finally filtered t (t=8) attributes that are helpful for label prediction as \(\{\)‘educational-num (en)’, ‘relationship (rs)’, ‘age (ae)’, ‘hours-per-week (hpw)’, ‘capital-gain (cg)’, ‘marital-status (ms)’, ‘capital-loss (cl)’, ‘education (et)’\(\}\). Then we calculate the accuracy when only the best k attributes (here the best means the smallest F-value) are retained in turn, and the results are shown on the right of Fig. 4a. To ensure lower accuracy, we will set the threshold \(\varepsilon =0.68\) and remove the two attributes with the greatest correlation to the sensitive attribute as \(\{\)‘relationship (rs)’, ‘hours-per-week (hpw)’\(\}\). The final latent factor c consisting of k (\(k=6\)) non-sensitive attributes, i.e., \(c = \{\)‘educational-num (en)’, ‘age (ae)’, ‘capital-gain (cg)’, ‘marital-status (ms)’, ‘capital-loss (cl)’, ‘education (et)’\(\}\).

For the COMPAS dataset, the F-values of n (where n=10) non-sensitive attributes are shown in Fig. 3b. Similarly, we sort the attributes from largest to smallest according to the F-value, and then calculate the accuracy when only the best (the largest F-value) t attributes are retained in turn, and the results are shown on the left of Fig. 4b. Here, t takes values in the range \(4 \le t \le 10 \). To ensure high accuracy, we set the threshold \(\delta =0.73\) and finally filtered t (t=7) attributes that are helpful for label prediction as \(\{\)‘is_violent_recid (ivr)’, ‘decile_score (ds)’, ‘priors_count (pc)’, ‘age (ae)’, ‘juv_other_count (joc)’, ‘juv_misd_count (jmc)’, ‘sex (sx)’\(\}\). Then, the accuracy when only the best (the smallest F-value) k attributes are retained is calculated in turn, and the results are shown on the right of Fig. 4b. To ensure lower accuracy, we will set the threshold \(\varepsilon =0.63\) and remove the three attributes that have the greatest correlation with the sensitive attribute as \(\{\)‘decile_score (ds)’, ‘priors_count (pc)’, ‘age (ae)’\(\}\). The final selection consists of k (\(k=4\)) non-sensitive attributes as the latent factor c, i.e., \(c = \{\)‘is_violent_recid (ivr)’, ‘juv_other_count (joc)’, ‘juv_misd_count (jmc)’, ‘sex (sx)’\(\}\).

Since the above datasets are tabular data, each sample contains categorical and continuous attributes. For the categorical attributes in the original dataset R, we execute one-hot encoding for dimension expansion, while the continuous attributes are restricted to take values in the interval \( [-1,1] \) by using normalization. Similarly, for the dataset \(R'\), the attributes of the corresponding dimensions for each sample in the original dataset are encoded by one-hot encoding and normalization according to the attributes in the latent factor c, and the remaining dimensions are composed of random noise \(z \sim P_z\).

5.1.2 Dataset Construction

Aiming to explore the impact of training datasets with different distributions on the accuracy and fairness of the proposed model, for the Adult Income dataset, we additionally change the distributions of samples on the sensitive attribute ‘gender’ and the label ‘income’ in the original dataset to construct three real datasets.

(1) Imbalanced-real dataset (Imba-real dataset)

The number of original sensitive attribute samples: male for 30,527; female for 14,695. The number of original label samples: income \(>50K\) is 11,208; income \(<50K\) is 34,014.

(2) Gender balanced-real dataset (GDba-real dataset)

Randomly remove the redundant male samples in the original dataset, and get the number of samples with balanced male and female attributes as 14,695.

(3) Income balanced-real dataset (ICba-real dataset)

Randomly remove the redundant income \(>50K\) samples from the original dataset, and get the number of balanced samples with income \(>50K\) and income \(<50K\) as 11,208.

The FACGAN with other methods will be compared under each of three real datasets mentioned above. Each real dataset is randomly divided into the training set and testing set, of which 2/3 is the training set and 1/3 is the testing set. The synthetic datasets generated corresponding to the above three real datasets are:

(1) Imbalanced-synthetic dataset (Imba-synthetic dataset): with the same number of both sensitive attribute samples and label samples as the Imba-real dataset.

(2) Gender balanced-synthetic dataset (GDba-synthetic dataset): with the same number of sensitive attribute samples as the GDba-real dataset.

(3) Income balanced-synthetic dataset (ICba-synthetic dataset): with the same number of label samples as the ICba-real dataset.

5.2 Experimental Configurations

For the Adult Income dataset, the generator G uses two hidden layers and each hidden layer has 128 neurons. The discriminator D and the adversary A each has one hidden layer with 256 neurons. The classifier C has two hidden layers with 64 neurons in the first layer and 32 neurons in the second layer. The activation function of the hidden layer is the ReLU function and the Sigmoid function in the output layer. For the COMPAS dataset, the generator G adopts two hidden layers, with 64 neurons in the first layer and 32 in the second layer. The discriminator D and the adversary A each uses one hidden layer with 64 neurons; the classifier C uses two hidden layers, with 32 neurons in the first layer and 16 in the second layer.

During the model training, all hyperparameter settings are achieved through the broad grid search. First, we manually determined the search ranges of the relevant hyperparameters: for the learning rate, the initial search grid is fixed from the range [1e-5, 1e-2] with the step size selected from {2e-5, 5e-5, 1e-4, 2e-4}; for the epoches, the grid is fixed from the range [500, 5000] with the step size chosen from {20, 50, 100, 150, 200}; the batchsize is chosen from {16, 32, 64, 128, 256, 512}. Finally, we choose an optimal set of values to take: the epoches is 3000 and the learning rate is 1e-4; for the batchsize, it is 128 on Adult Income dataset and 64 on the COMPAS dataset.

In addition, for the hyperparameters \(\lambda \) and \(\mu \), since they play a key role in the fairness and utility of the model, we further analyze their impact on the final experimental results by setting different values as \(\{0.5, 1, 1.5, 2\}\), respectively. For each value of the hyperparameter, we execute ten experimental training sessions to remove the best and worst cases and average the results. The configurations of hardware and software used for the experiments are detailed in Table 1.

Table 1 Configurations of hardware and software

5.3 Analysis of Fair Datasets

5.3.1 Analysis of Latent Factors

The proposed adversarial fair representation model (FACGAN) is equivalent to encoding the original samples and eliminating the strong correlation between non-sensitive attributes and sensitive attributes in the samples generated by the generator through a game between the generator and the adversary, and thus eliminating the implicit bias due to the correlation between attributes.

Fig. 5
figure 5

Accuracies of latent factors for predicting labels and sensitive attributes

To prove our conclusion, we separate out the encoding of the dimension in which the latent factor is located in the generated representation and conduct experiments for predicting labels and sensitive attributes, respectively. The results are shown in Fig. 5. Where ‘income’ and ‘gender’ correspond to the label and the sensitive attribute of the Adult Income dataset, and ‘is_recid’ and ‘race’ correspond to the label and the sensitive attribute of the Compas dataset.

As the number of adversarial training iterations increases, the accuracy of the latent factor in predicting both label and sensitive attributes decreases. First, this can verify that our method can reduce the correlation between non-sensitive attributes and sensitive information, providing some degree of interpretability for the fair representation generation. Second, fairness and utility are a contradictory issue, where the fairer the generated representation, the worse the utility. Therefore, the accuracy of the latent factor on label prediction is also decreased.

5.3.2 Comparison of Synthetic Datasets

For the quality of the synthetic dataset, \(\epsilon \)-Fairness is used as a metric for the fairness: \(BER(C(X, S)) = [P(C(X) =0|S=1) + P(C(X) =1|S=0)] / 2\). The data utility is measured by calculating the Euclidean distance between the synthetic dataset and the real dataset: \(ED(X, S)=||P_{real}(X, S) - P_g(X, S)||_2\).

Table 2 Data fairness and utility of synthetic datasets with \(\lambda \)=1 and \(\mu \)=1

We first compare the data utility and fairness for the datasets generated by the ACGAN, FairGAN, FairGAN\(^+\) and FACGAN. The experimental results are shown in Table 2. Where D1 is the real dataset without any processing, and it refers to the Adult Income (Imba-real dataset) in Table 2a, the Adult Income (GDba-real dataset) in Table 2b, the Adult Income (ICba-real dataset) in Table 2c, and the COMPAS dataset in Table 2d. D2 is the dataset generated by ACGAN, and D3 is generated by the FairGAN with \(\lambda =1\). D4 and D5 are generated by the FairGAN\(^+\) (\(\lambda =1\) and \(\mu =1\)) that satisfies Demographic Parity and Equality Odds, respectively. D6 and D7 are generated by the FACGAN (\(\lambda =1\) and \(\mu =1\)) that satisfies Demographic parity and Equality odds, respectively.

From Table 2, it can be seen that the dataset generated by ACGAN has good data utility, but lower fairness since it is unawareness of \(\epsilon \)-Fairness. On the contrary, the datasets generated using FairGAN, FairGAN\(^+\) and FACGAN to predict the real sensitive attribute S with a higher error rate BER, i.e., better fairness than both D1 and D2. In particular, it can be observed from Table 2a, b and d that the data generated by the FACGAN has a higher error rate when predicting the sensitive attribute S. Since the FACGAN removes sensitive attributes and non-sensitive attributes with high impact on the prediction of sensitive information at the input, so the synthetic data encodes sensitive attributes with lower probability and has better fairness. In addition, on the classification accuracy of the synthetic data, the generator of the FACGAN retains non-sensitive attributes that are helpful for label prediction, it will have higher accuracy than other approaches using random noise as the input to generate data.

Moreover, the most significant finding is that the distribution of data examples over the label is crucially important to the performance of the adversary. We observe from Table 2c that unlike the experimental results obtained on other datasets, the label-balanced data much more significantly improves the fairness of the FairGAN and FairGAN\(^+\) but also decreases the data utility in the process. Namely, they have a higher error rate in predicting the sensitive attribute S, but not only the Euclidean distance from the original dataset becomes larger, the classification accuracy has a more substantial decrease, which cannot guarantee the availability of the synthetic data. Although the FACGAN also suffers from the influence of the label-balanced data, i.e. the classification accuracy is reduced compared with that of other datasets, it can still improve the fairness of the synthetic dataset with good data utility and classification accuracy since non-sensitive attributes that are helpful for label prediction are retained in the model input.

Fig. 6
figure 6

Data fairness and utility of synthetic datasets on Imba-synthetic dataset with \(\mu \) = 1

Fig. 7
figure 7

Data fairness and utility of synthetic datasets on ICba-synthetic dataset with \(\mu \) = 1

Fig. 8
figure 8

Data fairness and utility of synthetic datasets on ICba-synthetic dataset with \(\mu \) = 1

In addition, in order to investigate the effect of the hyperparameter \(\lambda \) on the training model, the experiments are conducted on the Adult Income dataset by fixing the hyperparameter \(\mu = 1\) and changing the value of hyperparameter \(\lambda \). Here, we compare the data utility and fairness of datasets generated by FairGAN\(^+\) and FACGAN under different values of \(\lambda \). The experimental results are shown in Figs. 6, 7, and 8. It can be seen from the figure that the fairness of all three synthetic datasets with different distributions increases with the increase of \(\lambda \), while the Euclidean distance between the synthetic dataset and the real dataset keeps increasing, i.e., the data utility decreases with the increase of \(\lambda \). However, even though values of \(\lambda \) are constantly changing, all three synthetic datasets have more stable classification accuracy on the classifiers trained by FairGAN\(^+\) and FACGAN. In addition, the FACGAN can have better classification accuracy.

5.4 Analysis of Fair Classification

For the classification effectiveness of the model on the real dataset, we use the classification accuracy Accuracy and the concepts of classification fairness Demographic Parity and Equalized Odds as metrics, respectively. Where, the metrics of the fair classifier are as follows.

(1) \(DP(C) = |P(C(X)=1|S=1) - P(C(X)=1|S=0)|\)

(2) \(Eos1(C) = |P(C(X)=1|Y=1, S=1) - P(C(X)=1|Y=1, S=0)|\)

(3) \(Eos2(C) = |P(C(X)=1|Y=0, S=1) - P(C(X)=1|Y=0, S=0)|\)

5.4.1 Analysis of Adult Income Dataset

(1) Fair classification of FACGAN classifiers

Table 3 Classification fairness and accuracy of different classifiers for FACGAN on real dataset

To verify whether joint adversarial training of the generator and classifier can achieve better fair classification, we fix hyperparameters (\(\lambda =1\) and \(\mu =0\)) and (\(\lambda =1\) and \(\mu =1\)) for the model training of FACGAN, respectively. The former only imposes fairness constraints on the generator, while the latter imposes fairness constraints on both the generator and the classifier. Then, we compare the classification fairness and accuracy of classifiers obtained from these two training methods on real datasets with different distributions, and the experimental results are shown in Table 3. The C1 is the result of classifying the real dataset using Baseline classifier, where the real dataset refers to the Adult Income (Imba-real dataset) in Table 3a, the Adult Income (GDba-real dataset) in Table 3b, and the Adult Income (ICba-real dataset) in Table 3c. The C2 is the result of classifying the real dataset using FACGAN (\(\lambda =1\) and \(\mu =0\)) classifier; the C3 and C4 are results of classifying the real dataset using classifiers trained with the FACGAN (\(\lambda =1\) and \(\mu =1\)), where C3 satisfies Demographic Parity and C4 satisfies Equalized Odds.

From the figure, it can be seen that the DP(C) of C2 is much smaller than that of C1, while the Eos1(C) is larger than that of C1, hence it can be concluded that the classifier trained only from the fair data is able to achieve Demographic Parity, while Equalized Odds cannot be guaranteed. Moreover, from the results of C3 and C4, it can be found that the DP(C) of C3 becomes smaller than that of C2 and the Eos(C) of C4 decreases significantly. Therefore, it can be proved that the joint training of the generator with classifier can obtain better classification effectiveness and applying fairness constraints Equalized Odds to the classifier appears to be more effective.

(2) Fair classification of real datasets

Table 4 Classification fairness and accuracy of different classifiers on real dataset with \(\lambda \)=1 and \(\mu \)=1

In addition, we fix hyperparameters \(\lambda =1\) and \(\mu =1\), the fair classification and accuracy of the classifiers trained by Baseline, FairGAN\(^+\) and FACGAN methods are compared on real datasets with different distributions. The experimental results are shown in Table 4. The C1 is the result of classifying the real dataset using Baseline classifier, where the real dataset refers to the same as in Table 3. The C2 and C3 are the results of classifying the real dataset using FairGAN\(^+\) classifier, where C2 satisfies Demographic Parity and C3 satisfies Equalized Odds. The C4 and C5 are the results of classifying the real dataset using FACGAN classifier, where C4 satisfies Demographic Parity and C5 satisfies Equalized Odds.

As can be seen from the figure, the classifiers trained by FairGAN\(^+\) and FACGAN have better fairness than the Baseline when classifying real datasets with different distribution, while the introduction of fairness constraints also leads to the decrease in the classification accuracy. Further, the FACGAN has better classification fairness overall than FairGAN\(^+\), while in terms of classification accuracy, both of them are able to keep good classification utility.

Fig. 9
figure 9

Classification fairness and accuracy of different classifiers on Imba-real dataset with \(\lambda \)=1

Fig. 10
figure 10

Classification fairness and accuracy of different classifiers on GDba-real dataset with \(\lambda \)=1

Fig. 11
figure 11

Classification fairness and accuracy of different classifiers on ICba-real dataset with \(\lambda \)=1

In addition, in order to investigate the effect of the hyperparameter \(\mu \) for the training effect of the model, on the Adult Income dataset, the hyperparameter \(\lambda =1\) is fixed in the experiment. The classification fairness and accuracy of classifiers generated by Baseline, FairGAN\(^+\) and FACGAN are compared on real datasets with different distributions by varying the value of \(\mu \). The experimental results are shown in Figs. 9, 10 and 11. It can be seen from the figure that the classification fairness of classifiers increases as \(\mu \) increases, while the classification accuracy decreases. Therefore, the specific values of hyperparameters \(\lambda \) and \(\mu \) should be traded off between data utility, classification accuracy and fairness, i.e., the fairness of data and classification should be improved as much as possible while ensuring user-acceptable data utility and classification accuracy.

5.4.2 Analysis of COMPAS Dataset

On the COMPAS dataset, we also first verified that the classifier trained only from fair data is capable of achieving fair classification, and the experimental results are shown in Table 5. The C1 is the result of classifying the real dataset using Baseline classifier, the C2 is the result of classifying the real dataset using the classifier trained by FACGAN (\(\lambda =1\) and \(\mu =0\)). The C3 and C4 are the results of classifying the real dataset using classifiers trained by FACGAN (\(\lambda =1\) and \(\mu =1\)), where C3 satisfies Demographic Parity and C4 satisfies Equalized Odds. Similar to the experimental results on the Adult Income dataset, the C2 trained from the fair data is able to meet Demographic Parity, while Equalized Odds is not achieved. Further, applying fairness constraints to the classifier can obtain more effective classification fairness.

Table 5 Classification fairness and accuracy of different classifiers for FACGAN
Table 6 Classification fairness and accuracy of different classifiers with \(\lambda \)=1 and \(\mu \)=1

In addition, we fix hyperparameters \(\lambda =1\) and \(\mu =1\), the classification fairness and accuracy of classifiers generated using Baseline, FairGAN\(^+\) and FACGAN are compared on the real datasets. The experimental results are shown in Table 6. It can be seen from the figure that similar to the experimental results for the Adult Income dataset, classifiers trained using the FairGAN\(^+\) and FACGAN have better fairness than the Baseline, but the classification accuracy has decreased. The FACGAN has better classification fairness than the FairGAN\(^+\) overall, while both keeps good classification utility.

6 Conclusion

In this paper, we proposed a information-minimizing generative adversarial network model, called FACGAN, which can not only generate fair representations for the construction of fair datasets, but also train the model to achieve classification fairness. The new design has the following properties: first, a classifier is introduced into the original GAN to improve the ability of the generator to generate more realistic data samples with different class labels. Second, an adversary is introduced to train the generator to generate fair representations and the classifier to make fair classification. In addition, the FACGAN removes sensitive attributes and non-sensitive attributes that are more associated with sensitive attributes at the input. It effectively reduces the accuracy loss and makes the model obtain better training results for fair adversarial training. The experimental results show that both the generator and classifier obtained from the collaborative adversarial training can better achieve the fairness of data and classification.

In future work, we will consider another factor that causes the bias in the model prediction: imbalanced data, i.e., the model has higher classification accuracy for majority class samples. We consider augmenting minority class samples using the GAN for balancing the original dataset, and making the prediction results fairer while ensuring good classification accuracy.