Abstract
Studies show that machine learning models trained from biased data can discriminate against groups with certain sensitive attributes. This problem can be mitigated by cleaning the original data or learning fair representations. However, collecting real data in real-life is extremely time and resource-consuming, whereas generative models (e.g., GANs) can create new data that enable more application scenarios. Therefore, utilizing fair data generated by generative models can benefit various downstream tasks. In this paper, we propose a information-minimizing generative adversarial network to improve the fairness of machine learning by generating fair data. An ANOVA-based latent factor is constructed in the input for reducing the accuracy loss, and the joint adversarial training between the generator and classifier can better solve the indirect discrimination and achieve fair classification. Extensive experiments on various environments show the effectiveness of the proposed method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Data-driven machine learning algorithms have recently seen unprecedented success in the fields of computer vision, natural language processing, search engine and recommendation system, etc. The related applications involve many industries such as Internet, security, medical, transportation and finance, which strongly contribute to the society development. However, in some scenarios, e.g., offender risk assessment, salary prediction, or loan approval, machine learning models for automated decision-making may be socially biased against groups with specific genders or races. The reason for this social bias is that the training data used in machine learning models are sampled from real-life or synthetic data (simulated real samples). They variously imply people’s preferences or biases, and the evaluation metrics used in the training process will make the models amplify these preferences or biases [1]. As a result, trained models from these biased datasets will have discriminatory behavior, giving different predictive tendencies to people with different sensitive attributes (e.g., gender, race, and age) [2]. In addition, discrimination can be direct or indirect. The latter consists of two aspects, namely proxy discrimination and disparate impact [3]. The proxy discrimination resulting from the correlation between sensitive attributes and other attributes, and disparate impact refers to the correlation between the sensitive attributes and the output. Therefore, removing sensitive attributes can only eliminate direct discrimination, but cannot solve the unfairness caused by indirect discrimination.
Currently, a good deal of fair machine learning models prevent discrimination by modifying training data to eliminate biases associated with sensitive attributes [4, 5]. In particular, adversarial training [6] mitigates biases by learning new representations from which target variables can be predicted, but not sensitive attributes. In addition, as the Generative Adversarial Network (GAN) [7] has shown satisfactory results in generating high-quality synthetic data (that is similar to real data [8, 9]), many researches have attempted to construct fair datasets by combining the adversarial training and GAN [10,11,12,13,14,15]. Learning fair classifiers with fair synthetic data has a significant advantage, since obtaining real datasets of good quality and fairness is extremely resource intensive in most learning scenarios.
However, for synthetic data, the fairer they are, the worse the data utility. Therefore, the balance between data fairness and utility needs to be considered in model training. In addition, generating fair data can eliminate proxy discrimination well to achieve statistical fairness, but the disparate impact involves outputs of downstream tasks can be further mitigated. Based on the above analysis, a information-minimizing generative adversarial network for fair generation and classification (FACGAN) is proposed. The contributions of this work are presented as follows:
(1) The model is conducted by joint adversarial learning with the generator and the classifier to better eliminate indirect discrimination. The generator is trained to minimize the mutual information between non-sensitive and sensitive attributes, while the classifier minimizes the mutual information between predicted labels and sensitive attributes.
(2) The latent factor based on ANOVA is constructed to reduce the accuracy loss due to the introduction of fairness constraints: sensitive attributes are removed from the model input, while non-sensitive which are helpful for label prediction and little impact on the prediction of sensitive information, are selected to construct the latent factor.
(3) The experimental results show that our approach significantly improves the classification accuracy of fair data for downstream tasks and further improves the ability of models trained from fair datasets to perform fair classification.
2 Related Work
Achieving fairness in learning models is currently an imperative task in machine learning. Constructing fair datasets and training fair predictive models are two of the more prominent approaches to achieve fair machine learning. The former eliminates unfairness by detecting the bias or discrimination in the training data and pre-modifying or re-representing the data in the pre-processing. The latter is the adjustment, correction and refinement of the prediction model or algorithm during the in-processing.
2.1 Fair Dataset Construction
For Constructing fair datasets, modifying the training data is the intuitive method. For example, correcting the class labels of certain individuals in the data, reweighting and assigning weights to balance the data, or changing the sample sizes of different subgroups to eliminate biases [4, 5]. These methods overcome or mitigate the discrimination in the training data to some extent, but can only pre-process the original training data and do not have the generalization ability to process the unknown data. In contrast, the idea of combining the adversarial training and GAN to generate unbiased representations has been addressed in several works, which does not require modifying the training data and can maintain data integrity. Xu et al. [10] proposed a model called FairGAN. The model adds an extra discriminator to the original GAN for the purpose of training the generator to generate fair data independent of sensitive attributes to achieve fair classification. Kairouz et al. [11] introduced a data-driven framework for learning fair universal representation, which can guarantee the statistical fairness of any unknown learning task. The adversarial learning is used to generate fair representations with universal properties, where sensitive features have been actively decoupled. Xu et al. [12] presented a causal fairness-aware GAN (CFGAN) to generate high-quality fair data. The CFGAN consists of two generators and two discriminators. Specifically, two generators aim to simulate the original causal and intervention models, while two discriminators are employed to achieve high data utility and causal fairness. Ngxande et al. [13] established a generative adversarial network to generate image data of blacks for alleviating the imbalance of the dataset and improving the accuracy of the driver fatigue detection model. However, it did not consider the resource equity issue. Sattigeri et al. [14] proposed a new auxiliary classifier based on GAN to generate high-dimensional image data, which can enhance the model training and make the model learning more comprehensive and fair. Adeli et al. [15] removed sensitive attributes based on a domain adversarial network. The Pearson correlation coefficient was used to measure the degree of association between attributes, and it was used for continuously ordered variables for measuring the statistical dependence between variables.
However, the introduction of the adversarial training is equal to perturbing the generated data of the original GAN, which will reduce the generation quality of the generator to some extent. In addition, constructing a fair dataset cannot fully guarantee fair classification for models trained from them. Some classification-based fairness notions require constraining the outputs of downstream tasks and are more suitable for applying to prediction models.
2.2 Fair Prediction Model Construction
For the training of fair prediction models, some works suggested mitigating the bias in model prediction by adjusting the learning process, introducing constraints or regularization terms when designing the objective function [17, 18]. Currently, the adversarial training is used to remove information about sensitive attributes from the intermediate representation of the model to obtain a fair classifier. Xu et al. [16] proposed an improved model FairGAN\(^+\). By introducing an additional pair of classifier and discriminator, it is possible to train the classifier to achieve better fair classification. However, it does not remove sensitive attributes at the model input and fail to focus on the utility of the generated data, which will reduce the effectiveness of de-biasing training and cause the larger accuracy loss for downstream tasks. Beutel et al. [19] removed sensitive information of attributes extracted from the potential representations learned by the neural network using an adversarial training process to achieve fair classification. Zhang et al. [20] proposed a method for training unbiased, general and powerful machine learning models. The method enables multiple fairness constraints regardless of the complexity of the prediction model, or the properties of the described prediction and protection variables (discrete or continuous). Abusitta et al. [21] first analyzed which class of people in the dataset that the classifier was biased against, then extracted the features of that class as a condition, and used the GAN to generate synthetic data for the corresponding class of people to supplement the original dataset and obtain fair results of classification. Delobelle et al. [22] proposed model is trained using gradient inversion to prevent the decision results from predicting values of protected attributes while reducing the utility loss. New adversarial samples are generated using escape attacks from adversarial learning to retrain and improve the model to solve the problem caused by gradient inversion. Dhar et al. [23] proposed an adversarial gender de-biasing algorithm (AGENDA) to reduce gender information in face descriptors from previously trained face recognition networks, so as to reduce the gender predictability of face descriptors and mitigate gender bias in face verification, while maintaining reasonable recognition performance. Han et al. [24] introduced an improved IIoV framework (FIIoV) to address the issue that traditional fatigue detection models can be biased for certain groups and lead to inequitable resource allocation. The model detects driver fatigue using a convolutional neural network (CNN), and then adopts an adversarial network to achieve fairness.
Available studies usually train fair generation and prediction models separately. In fact, joint adversarial training of the generator and classifier can better improve the performance of each other. Therefore, the FACGAN combines the adversarial training of the generation and prediction models to generate fair representations and achieve fair classification. Furthermore, unlike existing methods, we pay additional attention to the accuracy loss due to the introduction of fairness constraints, i.e., the FACGAN selects some non-sensitive attributes (that are helpful for label prediction and little association with sensitive information) to replace some noise dimensions of the model input, which can mitigate the accuracy loss and better achieve a balance between fairness and utility.
3 Preliminary
3.1 Generative Adversarial Network and Variants
3.1.1 GAN Model
The Generative Adversarial Network (GAN) [7] is an architecture for training generative models through adversarial networks. It consists of two parts: a generator and a discriminator, both of which are composed of multilayer neural networks, respectively. Given a random noise variable \(z \sim P(z)\) as input, the generator G(z) tries to learn the generative distribution \(P_{g}\) to match the real data distribution \(P_{real}(x)\). Meanwhile, the discriminator D is a binary classifier for predicting whether the data samples are from the real data or the fake data generated by the generator. The objective loss function of D is:
where \(D(\cdot )\) outputs the probability that \(\cdot \) is from the real data rather than the fake data. For the generator G(z), the objective loss function is:
Minimization of Eq. (2) ensures that the discriminator is fooled by G(z) and D predicts high probability that G(z) is real data. Thus, the GAN can be formalized as a minimax game with the value function:
3.1.2 ACGAN Model
The Auxiliary Classifier Generative Adversarial Network (ACGAN) [25] is a variant of GAN. Each synthetic sample has a correlated class label y added to the noise z. The discriminator D in the model needs to determine the truth of the input data and also classify the data with labels achieved by adding an auxiliary classifier. This auxiliary classifier shares network structure and parameters with the D. The generator G can generate samples that satisfy the real data distribution and class label conditions. The objective function of the model consists of two parts: judging the true or false of the input data and correctly classifying them. For D, the objective loss function is:
For G, it is:
3.1.3 InfoGAN Model
Since the input z of the generator G in GAN is a continuous noise signal and without any constraints, it leads to poor interpretability as the GAN cannot map the specific dimension of z to the semantic features of the output. In contrast, the Information-maximizing Generative Adversarial Network (InfoGAN) [26] splits z into an incompressible noise z and an interpretable latent factor c. By constraining the relationship between c and the synthetic data, it is possible to make c contain interpretable information about the data. The objective function of the model is:
where, I(c, G(z, c)) represents the mutual information between c and the synthetic data. The larger the value of I(c, G(z, c)), the higher correlation between c and the synthetic data. The goal of the adversary in the model is to infer the value of the real latent factor from synthetic data. In contrast, the proposed approach aims to minimize the ability of the adversary to infer sensitive information from synthetic data and predictive labels. The goal is to better solve indirect discrimination by proxy and disparate impact.
3.2 Fairness Definitions
In the fairness studies of machine learning, there exists fairness concepts based on data and classification. For a labeled dataset D(X, Y, S), it contains a set of non-sensitive attributes \(X \in R^n\), class labels \(Y \in \{0,1\}\) and sensitive attributes \(S \in \{0,1\}\). For the sake of discussion, we take S and Y as binary variables. Then we introduce the fairness definition criterion used by the proposed model.
3.2.1 Definition of the Fair Dataset
The \(\epsilon \)-Fairness quantifies the data fairness by predicting the error rate of the sensitive attribute S from the given non-sensitive attribute X. A lower error rate means that S can be predicted by X. For classifiers trained on synthetic datasets and tested on real datasets, classification fairness is achieved if the disparate impact of real sensitive attributes is removed from the synthetic dataset. Here, we formally define \(\epsilon \)-Fairness.
Definition 1
(\(\epsilon \)-Fairness) [27] There exists a data distribution D(X, Y, S) that satisfies \(\epsilon \)-Fairness if \(BER(C(X), S) > \epsilon \) for any classifier \(C: X \rightarrow S\). The data distribution is considered to satisfy \(\epsilon \)-Fairness. The \(\epsilon \)-Fairness is used to measure the potential discrimination caused by the correlation between the non-sensitive attribute X and the sensitive attribute S. Where, BER is the average class conditional error on the data distribution D(X, Y, S). It can be expressed as equation (7).
3.2.2 Definition of the Fair Classification
The commonly used measures of classification fairness are Demographic Parity and Equalized Odds (and their variants). Among them, Demographic Parity ensures complete independence between the prediction target Y and sensitive attributes S to achieve group fairness. The difference between Equalized Odds and Demographic Parity is that the real label Y of the sample in the training set is used as a priori knowledge. Next, we formally define Demographic Parity and Equalized Odds.
Definition 2
(Demographic Parity) [28] The classifier C trained based on the data distribution D(X, Y, S) can be considered as satisfying Demographic Parity if the classification result C(X) can be independent of the sensitive attribute S, i.e., \(P(C(X) =1| S=1) = P(C(X) =1| S=0)\), which can be formulated as inequality (8). Where, the threshold \(\tau \) is used as a fairness constraint.
Definition 3
(Equalized Odds) [28] The classifier C trained based on the data distribution D(X, Y, S) can be considered as satisfying Equalized Odds if the classification result C(X) is conditionally independent of the sensitive attribute S. Given the label Y, i.e., \(P(C(X) =1| Y=y, S=1) = P(C(X) =1| Y=y, S=0)\), which can be formulated as inequality (8). Where, the \(\tau \) threshold is used as a fairness constraint.
4 Design of FACGAN
4.1 Model Structure
Motivated by the ACGAN and InfoGAN, the FACGAN is proposed in this paper, as shown in Fig. 1. It can ensure that no sensitive information can be inferred from the synthetic data and the predicted output, and then achieve better fair classification. The FACGAN is composed of four separate networks: Generator, Discriminator, Classifier and Adversary.
(1) Generator (G): Given random noise z and latent factor c, each synthetic sample G(z, c) has a corresponding sensitive attribute \(s \sim P_{real}(s)\) and label \(y \sim P_{real}(y)\), i.e., the synthetic data is \((x_g | y, s) \sim P_g(x | y, s)\). The goal of G is to ensure that the synthetic data \((x_g | y, s)\) is close to the real data and can be correctly classified, but cannot infer the user’s sensitive attribute s from it. Where, the latent factor c consists of non-sensitive attributes that are little impact for prediction of sensitive information. It can ensure good classification accuracy of generated data for downstream tasks.
(2) Discriminator (D): The D mainly distinguishes the real data from \(P_{real}(x | y, s)\) and the synthetic data from \(P_g(x | y, s)\).
(3) Classifier (C): Unlike the network structure in ACGAN where D and C share parameters, the C in FACGAN adopts an independent network, which can optimize C using the loss of C alone. The C performs fair and correct class label classification of both the synthetic data \((x_g | y, s)\) and the real data (x|y, s).
(4) Adversary (A): The A infers sensitive information \(s_g\) and \(s_c\) from the synthetic data \((x_g | y, s)\) and the prediction results \(y_c\) of the C to close the real sensitive information s of the user. The former acts as a fairness constraint on data generation: the G adversaries against the A to ensure that the synthetic data satisfies the concept of data fairness. The latter serves as a fairness constraint on classification: the C is confronted with the A to guarantee that the prediction results satisfy the concept of classification fairness.
4.2 Latent Factor Construction
If the input of the generator consists of random noise only, the freedom of the generated representations is large. In addition, the introduction of fairness constraints will further reduce the data utility, making it difficult to guarantee the prediction accuracy for downstream tasks. Therefore, we select some non-sensitive attributes of the original sample as part of the input. However, since non-sensitive attributes may imply sensitive information due to the correlation between attributes [29], we finally selected non-sensitive attributes that are helpful for label prediction and have low impact on the prediction of sensitive information as the latent factor.
Specifically, we use the ANOVA method [30] for feature selection, which can increase the classification reliability. The ANOVA can test whether the independent variable has a significant effect on the dependent variable by analyzing the error sources to judge whether the means of the different aggregates are equal. The error consists of the inter-group error SSA and the intra-group error SSE, and the F-statistic is used to calculate the ratio of SSA to SSE. The larger the F-value, the larger the difference between the groups, which means the independent variable has an effect on the dependent variable, i.e. there is a high correlation between these two variables.
Moreover, we adopt adversarial training to further remove the correlation of the latent factor with sensitive information. This will reduce the probability that the generated representations encode sensitive information, while ensuring the prediction accuracy of downstream tasks. The process of constructing the latent factor is shown in Fig. 2, which primarily consists of the following steps.
(1) Firstly, the ANOVA method is used to calculate the F-value of each non-sensitive attribute with respect to the label. Then, the n attributes are sorted by F-value from largest to smallest, and t attributes that are helpful for label prediction is selected as the candidate set of attributes for the latent factor according to the preset threshold \(\delta \). The threshold \(\delta \) refers to the accuracy obtained when t attributes are selected for label prediction.
(2) The values of F-value between non-sensitive attributes and sensitive attributes in the candidate set are calculated and ranked from smallest to largest, and the k attributes with low impact on the prediction of sensitive information are selected according to the preset threshold \(\varepsilon \). The threshold \(\varepsilon \) refers to the accuracy obtained when k attributes are selected for sensitive attribute prediction. Finally, the latent factor is obtained.
Here, \(k \le t \le n \) and \(k \le \frac{n}{2}\). The values of the hyperparameters t, k, \(\delta \) and \(\varepsilon \) can be adjusted according to the specific dataset used.
4.3 Model Training
The FACGAN is designed and constructed in a collaborative adversarial training between D, A, G, and C. We use binary cross-entropy to represent the adversarial loss. Where, the \(\Phi \) is the network parameter of D; the \(\varphi \) is the network parameter of A; the \(\theta \) is the network parameter of G; the \(\psi \) is the network parameter of C.
For the D, whose objective is to maximize the probability of correctly discriminating the data as true or false, the loss function is:
For the A, whose goal is to maximize the probability of inferring sensitive attributes from the synthetic data and class label predictions, the loss function is:
where, \(L_{GA}(x_g, \theta , \varphi )\) and \(L_{CA}(y_c, \psi , \varphi )\) are the adversarial loss functions for the A and G, A and C, respectively. The \(\lambda \) is a hyperparameter to specify the tradeoff between data utility and fairness. The \(\mu \) is a hyperparameter to describe the tradeoff between classification utility and fairness. The loss functions are specified as:
For the C, the goal is to maximize the probability of correctly predicting class labels of the synthetic and real data, while minimizing the probability of the A inferring sensitive attributes from predicted class labels, with the loss function:
For the G, the goal is to maximize the probability that the synthetic data is discriminated as true and correctly classified, while minimizing the probability that the A infers sensitive attributes from the synthetic data with a loss function of:
In the cooperative adversarial training between the discriminator D, generator G and classifier C, improving the utility of G can improve the utility of C in making correct predictions, and improving C can improve the performance of G in generating more realistic samples for each class label. Moreover, in adversarial training between the adversary A, generator G, and classifier C, improving the fairness of G can improve the ability of C to classify fairly, and improving the fairness of C can improve the ability of G to generate fair data. Therefore, by training both the generative model and the classifier, the FACGAN will perform better than either the independent generative model or the independent classifier. The specific model training process is shown in Algorithm 1.
5 Experimental Analysis
The experiment consists of two parts: (1) Testing the fairness of the synthetic dataset; (2) Testing the fairness of the classifier.
To verify the quality of the fair dataset generated by the FACGAN model,Footnote 1 we measure the utility of the synthetic dataset by calculating the Euclidean distance between the synthetic dataset and the real dataset, and additionally use the predictive accuracy to measure the effectiveness of the synthetic dataset for downstream tasks. Moreover, we adopt \(\epsilon \)-Fairness as a metric for the fairness of the synthetic dataset.
To check the classification effectiveness of the FACGAN model, we use the prediction accuracy Accuracy and fairness criteria Demographic Parity, Equalized Odds as metrics for the trained model on real datasets.
Since existing GAN-based models focus on different aspects, we choose to compare the different effects of the three baseline methods: (a) ACGAN [25] without considering fairness, and we compare it on data quality; (b) FairGAN [10] can generate fair data, and we compare it on fair data generation; (c) FairGAN\(^+\) [16] without input preprocessing, and we compare it on fair data generation and classification.
In addition, the classifier trained on the unprocessed dataset is used as Baseline. All classifiers built in ACGAN, FairGAN\(^+\) and FACGAN are logistic regression model.
5.1 Dataset Processing
We use the UCI Adult Income [31] and ProPublica COMPAS [32] datasets to verify the validity of the proposed model. According to the statistics, both datasets have serious gender and race discrimination, respectively.
UCI Adult Income This dataset contains basic information about the US residents at that time and the corresponding annual income, which can be used to predict whether a person earns more than 50K per year. 48,842 (45,222 after deleting records containing “?") Samples are included in the dataset, and each sample consists of 15 attributes, with the sensitive attribute “gender", where male is the protected group and female is the unprotected group. The positive class label is someone’s annual salary income over 50K.
ProPublica COMPAS This dataset statistics the results of the intelligent AI software–COMPAS for predicting whether more than 10,000 offenders in Florida will re-offend within 2 years. The COMPAS annotates the defendants by asking them 137 questions. This dataset contains 11,757 (6150 after deleting records containing null or abnormal values) samples, each described by 52 (12 after data preprocessing) attributes. The sensitive attribute is “race", classified as white (specifically Caucasian) or non-white (including black, African-American, Hispanic, Latino, Asian, and other races; only African-Americans were considered in our experiment), where white is the protected group and non-white is the unprotected group. The positive class is labeled as someone who would not re-offend within 2 years.
5.1.1 Examples of Latent Factor Construction
According to the requirement of the proposed model, non-sensitive attributes that are helpful for class label prediction and have low impact on the prediction of sensitive information are selected to construct the latent factor in data preprocessing. Therefore, we first calculate the F-value of each non-sensitive attribute with the label and sensitive attribute by using ANOVA method [30], and rank attributes by the magnitude of F-value.
For the Adult Income dataset, the F-values of n (where \(n=13\)) non-sensitive attributes relative to the label Income and the sensitive attribute Gender, respectively, are shown in Fig. 3a. First, we sort the attributes from largest to smallest according to the F-value, and then calculate the accuracy when only the best t attributes (here the best refers to the largest F-value) are retained in turn, and the results are shown on the left of Fig. 4a. Here, t takes values in the range of \(6 \le t \le 13 \). To ensure high accuracy, we set the threshold \(\delta =0.87\) and finally filtered t (t=8) attributes that are helpful for label prediction as \(\{\)‘educational-num (en)’, ‘relationship (rs)’, ‘age (ae)’, ‘hours-per-week (hpw)’, ‘capital-gain (cg)’, ‘marital-status (ms)’, ‘capital-loss (cl)’, ‘education (et)’\(\}\). Then we calculate the accuracy when only the best k attributes (here the best means the smallest F-value) are retained in turn, and the results are shown on the right of Fig. 4a. To ensure lower accuracy, we will set the threshold \(\varepsilon =0.68\) and remove the two attributes with the greatest correlation to the sensitive attribute as \(\{\)‘relationship (rs)’, ‘hours-per-week (hpw)’\(\}\). The final latent factor c consisting of k (\(k=6\)) non-sensitive attributes, i.e., \(c = \{\)‘educational-num (en)’, ‘age (ae)’, ‘capital-gain (cg)’, ‘marital-status (ms)’, ‘capital-loss (cl)’, ‘education (et)’\(\}\).
For the COMPAS dataset, the F-values of n (where n=10) non-sensitive attributes are shown in Fig. 3b. Similarly, we sort the attributes from largest to smallest according to the F-value, and then calculate the accuracy when only the best (the largest F-value) t attributes are retained in turn, and the results are shown on the left of Fig. 4b. Here, t takes values in the range \(4 \le t \le 10 \). To ensure high accuracy, we set the threshold \(\delta =0.73\) and finally filtered t (t=7) attributes that are helpful for label prediction as \(\{\)‘is_violent_recid (ivr)’, ‘decile_score (ds)’, ‘priors_count (pc)’, ‘age (ae)’, ‘juv_other_count (joc)’, ‘juv_misd_count (jmc)’, ‘sex (sx)’\(\}\). Then, the accuracy when only the best (the smallest F-value) k attributes are retained is calculated in turn, and the results are shown on the right of Fig. 4b. To ensure lower accuracy, we will set the threshold \(\varepsilon =0.63\) and remove the three attributes that have the greatest correlation with the sensitive attribute as \(\{\)‘decile_score (ds)’, ‘priors_count (pc)’, ‘age (ae)’\(\}\). The final selection consists of k (\(k=4\)) non-sensitive attributes as the latent factor c, i.e., \(c = \{\)‘is_violent_recid (ivr)’, ‘juv_other_count (joc)’, ‘juv_misd_count (jmc)’, ‘sex (sx)’\(\}\).
Since the above datasets are tabular data, each sample contains categorical and continuous attributes. For the categorical attributes in the original dataset R, we execute one-hot encoding for dimension expansion, while the continuous attributes are restricted to take values in the interval \( [-1,1] \) by using normalization. Similarly, for the dataset \(R'\), the attributes of the corresponding dimensions for each sample in the original dataset are encoded by one-hot encoding and normalization according to the attributes in the latent factor c, and the remaining dimensions are composed of random noise \(z \sim P_z\).
5.1.2 Dataset Construction
Aiming to explore the impact of training datasets with different distributions on the accuracy and fairness of the proposed model, for the Adult Income dataset, we additionally change the distributions of samples on the sensitive attribute ‘gender’ and the label ‘income’ in the original dataset to construct three real datasets.
(1) Imbalanced-real dataset (Imba-real dataset)
The number of original sensitive attribute samples: male for 30,527; female for 14,695. The number of original label samples: income \(>50K\) is 11,208; income \(<50K\) is 34,014.
(2) Gender balanced-real dataset (GDba-real dataset)
Randomly remove the redundant male samples in the original dataset, and get the number of samples with balanced male and female attributes as 14,695.
(3) Income balanced-real dataset (ICba-real dataset)
Randomly remove the redundant income \(>50K\) samples from the original dataset, and get the number of balanced samples with income \(>50K\) and income \(<50K\) as 11,208.
The FACGAN with other methods will be compared under each of three real datasets mentioned above. Each real dataset is randomly divided into the training set and testing set, of which 2/3 is the training set and 1/3 is the testing set. The synthetic datasets generated corresponding to the above three real datasets are:
(1) Imbalanced-synthetic dataset (Imba-synthetic dataset): with the same number of both sensitive attribute samples and label samples as the Imba-real dataset.
(2) Gender balanced-synthetic dataset (GDba-synthetic dataset): with the same number of sensitive attribute samples as the GDba-real dataset.
(3) Income balanced-synthetic dataset (ICba-synthetic dataset): with the same number of label samples as the ICba-real dataset.
5.2 Experimental Configurations
For the Adult Income dataset, the generator G uses two hidden layers and each hidden layer has 128 neurons. The discriminator D and the adversary A each has one hidden layer with 256 neurons. The classifier C has two hidden layers with 64 neurons in the first layer and 32 neurons in the second layer. The activation function of the hidden layer is the ReLU function and the Sigmoid function in the output layer. For the COMPAS dataset, the generator G adopts two hidden layers, with 64 neurons in the first layer and 32 in the second layer. The discriminator D and the adversary A each uses one hidden layer with 64 neurons; the classifier C uses two hidden layers, with 32 neurons in the first layer and 16 in the second layer.
During the model training, all hyperparameter settings are achieved through the broad grid search. First, we manually determined the search ranges of the relevant hyperparameters: for the learning rate, the initial search grid is fixed from the range [1e-5, 1e-2] with the step size selected from {2e-5, 5e-5, 1e-4, 2e-4}; for the epoches, the grid is fixed from the range [500, 5000] with the step size chosen from {20, 50, 100, 150, 200}; the batchsize is chosen from {16, 32, 64, 128, 256, 512}. Finally, we choose an optimal set of values to take: the epoches is 3000 and the learning rate is 1e-4; for the batchsize, it is 128 on Adult Income dataset and 64 on the COMPAS dataset.
In addition, for the hyperparameters \(\lambda \) and \(\mu \), since they play a key role in the fairness and utility of the model, we further analyze their impact on the final experimental results by setting different values as \(\{0.5, 1, 1.5, 2\}\), respectively. For each value of the hyperparameter, we execute ten experimental training sessions to remove the best and worst cases and average the results. The configurations of hardware and software used for the experiments are detailed in Table 1.
5.3 Analysis of Fair Datasets
5.3.1 Analysis of Latent Factors
The proposed adversarial fair representation model (FACGAN) is equivalent to encoding the original samples and eliminating the strong correlation between non-sensitive attributes and sensitive attributes in the samples generated by the generator through a game between the generator and the adversary, and thus eliminating the implicit bias due to the correlation between attributes.
To prove our conclusion, we separate out the encoding of the dimension in which the latent factor is located in the generated representation and conduct experiments for predicting labels and sensitive attributes, respectively. The results are shown in Fig. 5. Where ‘income’ and ‘gender’ correspond to the label and the sensitive attribute of the Adult Income dataset, and ‘is_recid’ and ‘race’ correspond to the label and the sensitive attribute of the Compas dataset.
As the number of adversarial training iterations increases, the accuracy of the latent factor in predicting both label and sensitive attributes decreases. First, this can verify that our method can reduce the correlation between non-sensitive attributes and sensitive information, providing some degree of interpretability for the fair representation generation. Second, fairness and utility are a contradictory issue, where the fairer the generated representation, the worse the utility. Therefore, the accuracy of the latent factor on label prediction is also decreased.
5.3.2 Comparison of Synthetic Datasets
For the quality of the synthetic dataset, \(\epsilon \)-Fairness is used as a metric for the fairness: \(BER(C(X, S)) = [P(C(X) =0|S=1) + P(C(X) =1|S=0)] / 2\). The data utility is measured by calculating the Euclidean distance between the synthetic dataset and the real dataset: \(ED(X, S)=||P_{real}(X, S) - P_g(X, S)||_2\).
We first compare the data utility and fairness for the datasets generated by the ACGAN, FairGAN, FairGAN\(^+\) and FACGAN. The experimental results are shown in Table 2. Where D1 is the real dataset without any processing, and it refers to the Adult Income (Imba-real dataset) in Table 2a, the Adult Income (GDba-real dataset) in Table 2b, the Adult Income (ICba-real dataset) in Table 2c, and the COMPAS dataset in Table 2d. D2 is the dataset generated by ACGAN, and D3 is generated by the FairGAN with \(\lambda =1\). D4 and D5 are generated by the FairGAN\(^+\) (\(\lambda =1\) and \(\mu =1\)) that satisfies Demographic Parity and Equality Odds, respectively. D6 and D7 are generated by the FACGAN (\(\lambda =1\) and \(\mu =1\)) that satisfies Demographic parity and Equality odds, respectively.
From Table 2, it can be seen that the dataset generated by ACGAN has good data utility, but lower fairness since it is unawareness of \(\epsilon \)-Fairness. On the contrary, the datasets generated using FairGAN, FairGAN\(^+\) and FACGAN to predict the real sensitive attribute S with a higher error rate BER, i.e., better fairness than both D1 and D2. In particular, it can be observed from Table 2a, b and d that the data generated by the FACGAN has a higher error rate when predicting the sensitive attribute S. Since the FACGAN removes sensitive attributes and non-sensitive attributes with high impact on the prediction of sensitive information at the input, so the synthetic data encodes sensitive attributes with lower probability and has better fairness. In addition, on the classification accuracy of the synthetic data, the generator of the FACGAN retains non-sensitive attributes that are helpful for label prediction, it will have higher accuracy than other approaches using random noise as the input to generate data.
Moreover, the most significant finding is that the distribution of data examples over the label is crucially important to the performance of the adversary. We observe from Table 2c that unlike the experimental results obtained on other datasets, the label-balanced data much more significantly improves the fairness of the FairGAN and FairGAN\(^+\) but also decreases the data utility in the process. Namely, they have a higher error rate in predicting the sensitive attribute S, but not only the Euclidean distance from the original dataset becomes larger, the classification accuracy has a more substantial decrease, which cannot guarantee the availability of the synthetic data. Although the FACGAN also suffers from the influence of the label-balanced data, i.e. the classification accuracy is reduced compared with that of other datasets, it can still improve the fairness of the synthetic dataset with good data utility and classification accuracy since non-sensitive attributes that are helpful for label prediction are retained in the model input.
In addition, in order to investigate the effect of the hyperparameter \(\lambda \) on the training model, the experiments are conducted on the Adult Income dataset by fixing the hyperparameter \(\mu = 1\) and changing the value of hyperparameter \(\lambda \). Here, we compare the data utility and fairness of datasets generated by FairGAN\(^+\) and FACGAN under different values of \(\lambda \). The experimental results are shown in Figs. 6, 7, and 8. It can be seen from the figure that the fairness of all three synthetic datasets with different distributions increases with the increase of \(\lambda \), while the Euclidean distance between the synthetic dataset and the real dataset keeps increasing, i.e., the data utility decreases with the increase of \(\lambda \). However, even though values of \(\lambda \) are constantly changing, all three synthetic datasets have more stable classification accuracy on the classifiers trained by FairGAN\(^+\) and FACGAN. In addition, the FACGAN can have better classification accuracy.
5.4 Analysis of Fair Classification
For the classification effectiveness of the model on the real dataset, we use the classification accuracy Accuracy and the concepts of classification fairness Demographic Parity and Equalized Odds as metrics, respectively. Where, the metrics of the fair classifier are as follows.
(1) \(DP(C) = |P(C(X)=1|S=1) - P(C(X)=1|S=0)|\)
(2) \(Eos1(C) = |P(C(X)=1|Y=1, S=1) - P(C(X)=1|Y=1, S=0)|\)
(3) \(Eos2(C) = |P(C(X)=1|Y=0, S=1) - P(C(X)=1|Y=0, S=0)|\)
5.4.1 Analysis of Adult Income Dataset
(1) Fair classification of FACGAN classifiers
To verify whether joint adversarial training of the generator and classifier can achieve better fair classification, we fix hyperparameters (\(\lambda =1\) and \(\mu =0\)) and (\(\lambda =1\) and \(\mu =1\)) for the model training of FACGAN, respectively. The former only imposes fairness constraints on the generator, while the latter imposes fairness constraints on both the generator and the classifier. Then, we compare the classification fairness and accuracy of classifiers obtained from these two training methods on real datasets with different distributions, and the experimental results are shown in Table 3. The C1 is the result of classifying the real dataset using Baseline classifier, where the real dataset refers to the Adult Income (Imba-real dataset) in Table 3a, the Adult Income (GDba-real dataset) in Table 3b, and the Adult Income (ICba-real dataset) in Table 3c. The C2 is the result of classifying the real dataset using FACGAN (\(\lambda =1\) and \(\mu =0\)) classifier; the C3 and C4 are results of classifying the real dataset using classifiers trained with the FACGAN (\(\lambda =1\) and \(\mu =1\)), where C3 satisfies Demographic Parity and C4 satisfies Equalized Odds.
From the figure, it can be seen that the DP(C) of C2 is much smaller than that of C1, while the Eos1(C) is larger than that of C1, hence it can be concluded that the classifier trained only from the fair data is able to achieve Demographic Parity, while Equalized Odds cannot be guaranteed. Moreover, from the results of C3 and C4, it can be found that the DP(C) of C3 becomes smaller than that of C2 and the Eos(C) of C4 decreases significantly. Therefore, it can be proved that the joint training of the generator with classifier can obtain better classification effectiveness and applying fairness constraints Equalized Odds to the classifier appears to be more effective.
(2) Fair classification of real datasets
In addition, we fix hyperparameters \(\lambda =1\) and \(\mu =1\), the fair classification and accuracy of the classifiers trained by Baseline, FairGAN\(^+\) and FACGAN methods are compared on real datasets with different distributions. The experimental results are shown in Table 4. The C1 is the result of classifying the real dataset using Baseline classifier, where the real dataset refers to the same as in Table 3. The C2 and C3 are the results of classifying the real dataset using FairGAN\(^+\) classifier, where C2 satisfies Demographic Parity and C3 satisfies Equalized Odds. The C4 and C5 are the results of classifying the real dataset using FACGAN classifier, where C4 satisfies Demographic Parity and C5 satisfies Equalized Odds.
As can be seen from the figure, the classifiers trained by FairGAN\(^+\) and FACGAN have better fairness than the Baseline when classifying real datasets with different distribution, while the introduction of fairness constraints also leads to the decrease in the classification accuracy. Further, the FACGAN has better classification fairness overall than FairGAN\(^+\), while in terms of classification accuracy, both of them are able to keep good classification utility.
In addition, in order to investigate the effect of the hyperparameter \(\mu \) for the training effect of the model, on the Adult Income dataset, the hyperparameter \(\lambda =1\) is fixed in the experiment. The classification fairness and accuracy of classifiers generated by Baseline, FairGAN\(^+\) and FACGAN are compared on real datasets with different distributions by varying the value of \(\mu \). The experimental results are shown in Figs. 9, 10 and 11. It can be seen from the figure that the classification fairness of classifiers increases as \(\mu \) increases, while the classification accuracy decreases. Therefore, the specific values of hyperparameters \(\lambda \) and \(\mu \) should be traded off between data utility, classification accuracy and fairness, i.e., the fairness of data and classification should be improved as much as possible while ensuring user-acceptable data utility and classification accuracy.
5.4.2 Analysis of COMPAS Dataset
On the COMPAS dataset, we also first verified that the classifier trained only from fair data is capable of achieving fair classification, and the experimental results are shown in Table 5. The C1 is the result of classifying the real dataset using Baseline classifier, the C2 is the result of classifying the real dataset using the classifier trained by FACGAN (\(\lambda =1\) and \(\mu =0\)). The C3 and C4 are the results of classifying the real dataset using classifiers trained by FACGAN (\(\lambda =1\) and \(\mu =1\)), where C3 satisfies Demographic Parity and C4 satisfies Equalized Odds. Similar to the experimental results on the Adult Income dataset, the C2 trained from the fair data is able to meet Demographic Parity, while Equalized Odds is not achieved. Further, applying fairness constraints to the classifier can obtain more effective classification fairness.
In addition, we fix hyperparameters \(\lambda =1\) and \(\mu =1\), the classification fairness and accuracy of classifiers generated using Baseline, FairGAN\(^+\) and FACGAN are compared on the real datasets. The experimental results are shown in Table 6. It can be seen from the figure that similar to the experimental results for the Adult Income dataset, classifiers trained using the FairGAN\(^+\) and FACGAN have better fairness than the Baseline, but the classification accuracy has decreased. The FACGAN has better classification fairness than the FairGAN\(^+\) overall, while both keeps good classification utility.
6 Conclusion
In this paper, we proposed a information-minimizing generative adversarial network model, called FACGAN, which can not only generate fair representations for the construction of fair datasets, but also train the model to achieve classification fairness. The new design has the following properties: first, a classifier is introduced into the original GAN to improve the ability of the generator to generate more realistic data samples with different class labels. Second, an adversary is introduced to train the generator to generate fair representations and the classifier to make fair classification. In addition, the FACGAN removes sensitive attributes and non-sensitive attributes that are more associated with sensitive attributes at the input. It effectively reduces the accuracy loss and makes the model obtain better training results for fair adversarial training. The experimental results show that both the generator and classifier obtained from the collaborative adversarial training can better achieve the fairness of data and classification.
In future work, we will consider another factor that causes the bias in the model prediction: imbalanced data, i.e., the model has higher classification accuracy for majority class samples. We consider augmenting minority class samples using the GAN for balancing the original dataset, and making the prediction results fairer while ensuring good classification accuracy.
Availability of Supporting Data
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
Notes
The code will be released at https://github.com/carriecql/FACGAN.
References
Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A (2022) A survey on bias and fairness in machine learning. ACM Comput Surv 54(6):1–35
Alabi D, Immorlica N, Kalai A (2018) Unleashing linear optimizers for group-fair learning and optimization. Conference on learning theory. PMLR, 2043–2066
Algaba A, Mazijn C, Prunkl C, Danckaert J, Ginis V (2022) LUCID-GAN: conditional generative models to locate unfairness. Available at SSRN
Lu Zhang, Yongkai Wu, Xintao Wu (2017) A causal framework for discovering and removing direct and indirect discrimination (IJCAI’17), pp 3929–3935
Calmon F, Wei D, Vinzamuri B, Natesan Ramamurthy, K, Varshney KR (2017) Optimized pre-processing for discrimination prevention. Advances in neural information processing systems, pp 3992–4001
Song C, Shmatikov V (2019) Overlearning reveals sensitive attributes, arXiv preprint arXiv:1905.11742
Goodfellow IJ, Pouget-Abadie J, Mirza M et al. (2014) Generative adversarial nets. NIPS
Brock A, Donahue J, Simonyan K (2018) Large scale GAN training for high fidelity natural image synthesis. International conference on learning representations, arXiv preprint arXiv:1809.11096
Brown T, Mann B, Ryder N et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Xu D, Yuan S, Zhang L et al. (2018) Fairgan: fairness-aware generative adversarial networks. 2018 IEEE international conference on big data (big data). IEEE, pp 570–575
Kairouz P, Liao J, Huang C et al (2022) Generating fair universal representations using adversarial models. IEEE Trans Inf Forensics Secur 17:1970–1985
Xu D, Wu Y, Yuan S et al. (2019) Achieving causal fairness through generative adversarial networks. Proceedings of the 28th international joint conference on artificial intelligence
Ngxande M, Tapamo JR, Burke M (2020) Bias remediation in driver drowsiness detection systems using generative adversarial networks. IEEE Access 8:55592–55601
Sattigeri P, Hoffman SC, Chenthamarakshan V et al (2019) Fairness GAN: Generating datasets with fairness properties using a generative adversarial network. IBM J Res Dev 63(4/5):3:1-3:9
Adeli E, Zhao Q, Pfefferbaum A et al. (2021) Representation learning with statistical independence to mitigate bias. Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 2513–2523
Xu D, Yuan S, Zhang L et al. (2019) Fairgan+: achieving fair data generation and classification through generative adversarial nets. 2019 IEEE international conference on big sata (Big Data). IEEE, pp 1401–1406
Zafar MB, Valera I, Rogriguez MG et al. (2017) Fairness constraints: mechanisms for fair classification. Artificial intelligence and statistics. PMLR, pp 962–970
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Advances in neural information processing systems, p 29
Beutel A, Chen J, Zhao Z et al. (2017) Data decisions and theoretical implications when adversarially learning fair representations, arXiv preprint arXiv: 1707.00075
Zhang BH, Lemoine B, Mitchell M (2018) Mitigating unwanted biases with adversarial learning. Proceedings of the 2018 AAAI/ACM conference on AI, ethics, and society, pp 335–340
Abusitta A, Aimeur E, Abdel Wahab O (2020) Generative adversarial networks for mitigating biases in machine learning systems. ECAI 2020. IOS Press, pp 937–944
Delobelle P, Temple P, Perrouin G et al (2021) Ethical adversaries: towards mitigating unfairness with adversarial machine learning. ACM SIGKDD Explorations Newsl 23(1):32–41
Dhar P, Gleason J, Souri H et al. (2020) Towards gender-neutral face descriptors for mitigating bias in face recognition, arXiv preprint arXiv: 2006.07845
Han M, Wu J, Bashir AK et al. (2020) Adversarial learning-based bias mitigation for fatigue driving detection in fair-intelligent iov. GLOBECOM 2020-2020 IEEE global communications conference. IEEE, pp 1–6
Odena A, Olah C, Shlens J. (2017) Conditional image synthesis with auxiliary classifier gans. International conference on machine learning. PMLR, pp 2642–2651
Chen X, Duan Y, Houthooft R et al. (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. Advances in neural information processing systems, p 29
Feldman M, Friedler SA, Moeller J et al. (2015) Certifying and removing disparate impact. Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 259–268
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. Advances in neural information processing systems, p 29
Jiang H, Nachum O (2020) Identifying and correcting label bias in machine learning. In international conference on artificial intelligence and statistics. PMLR, pp 702–712
Shaw RG, Mitchell-Olds T (1993) ANOVA for unbalanced data: an overview. Ecology 74(6):1638–1645
Dua Dheeru and Efi Karra Taniskidou (2017) UCI machine learning repository. University of California, Irvine
Zafar MB, Valera I, Gomez RM et al. (2017) Fairness beyond disparate treatment and disparate impact: Learning classification without disparate mistreatment. Proceedings of the 26th international conference on World Wide Web. International World Wide Web conferences steering committee, pp 1171–1180
Acknowledgements
This work is supported partially by the National Natural Science Foundation of China [61972096, 61771140, 61872088, 61872090, 61902289], and the University-Industry Cooperation of Fujian Province [2022H6025].
Funding
Not Applicable
Author information
Authors and Affiliations
Contributions
Qiuling Chen: Methodology, Formal analysis, and Writing-Original Draft; Ayong Ye: Conceptualization, Resources, Writing-Review & Editing and Funding acquisition; Yuexin Zhang: Writing-Reviewing and Editing; Jianwei Chen: Writing-Reviewing and Editing; Chuan Huang: Supervision.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of the article.
Ethical Approval
Not Applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, Q., Ye, A., Zhang, Y. et al. Information-Minimizing Generative Adversarial Network for Fair Generation and Classification. Neural Process Lett 56, 36 (2024). https://doi.org/10.1007/s11063-024-11457-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s11063-024-11457-8