1 Introduction

Protecting fairness in a machine learning model means measuring and eliminating discrimination in the model and ensuring the applications built around the model are trustworthy. The aim is to prevent the model from making significantly different predictions for different sub-groups, where each subgroup is divided by a “sensitive feature”, such as race, gender, or age. The phenomenon of unfairness has been observed frequently across the machine learning field. For example, software used to support recruitment and hiring decisions has been found to be discriminatory [1]. And there have been severe problems with gender bias in the Amazon AI curriculum selection [2]. In fact, machine learning models are so widely deployed in our society that, without fairness protection, we may find the impacts of discrimination to be catastrophic [3,4,5,6,7,8].

Fortunately, studies on fairness have a long history [9], so there is much documented evidence of not only the biases that lead to unfair predictions [10,11,12] but also a plethora of approaches to overcome those biases. Suggested strategies include data sampling, re-weighting, and modification methods to enforce equal predictions across subgroups, along with different fairness metrics to measure the differences in predictions.

Most previous articles on fairness have focused on numerical or tablet inputs. However, there are a growing number of studies dealing with fairness protection for image inputs. Rapid developments in deep learning have seen various image datasets emerge, such as ImageNet [13] and KITTI [14], and, as in the past, unfairness and discrimination have again been observed with image inputs and deep model deployments. Buolamwini and Gebru [15] find that commercial face recognition systems suffer significant prediction gaps across populations, while Brandao [16] find age and gender biases among pedestrian detection algorithms. Unfair predictions jeopardize model performance for minorities and lead to negative social impacts. Moreover, this phenomenon has also led to concerns over the training of deep models in that they tend to learn short-cut features that are irrelevant to the learning targets. Thus, the central issue in fairness protection for images is to break this short-cut learning so as to avoid unfair predictions.

It is natural to adopt previous methods for fair protection with image inputs. However, traditional value modification methods cannot hold. As discussed in [17, 18], an image feature, such as race, cannot be modified from one attribute to another, and enforcing fairness with other methods is inefficient compared to deep model methods [19]. Intuitively, simply balancing the training dataset should resolve the problem. However, as indicated in [20, 21], constructing a balanced dataset in all its attributes can be very challenging. What’s more, even with balanced datasets, bias in the trained model cannot be eliminated completely [22].

The more recent deep learning methods try to enforce fairness protection with images through additional constraints, by removing sensitive features, and/or by learning fair representations. These strategies are often applied during training with the overarching objective of minimizing prediction gaps across the subgroups. The main challenge with this work is to remove any spurious bias that favors one subgroup over another. Part of this involves deriving invariant features during the learning process that will generalize well across domains. Broadly speaking, deep model studies in image fairness encourage a deeper understanding of the dynamic learning procedures of the model.

To better understand these studies and to encourage further development in the field, we, with this paper, have summarized the deep learning based fairness protection methods for images. Beyond highlighting the differences between image and numerical inputs, we also aim to reveal research trends in the field; summarize the methods and expose the fundamentals that bind these approaches; and discuss the main challenges researchers are currently facing in ensuring better protection.

Fairness concerns a range of fields. For example, different biases, such as historical bias, measurement bias, and evaluation bias, lead to different types of unfair predictions [10]. Further, fairness is also a subject that concerns social issues and impacts [11, 23]. Grgic-Hlaca et al. [24], for example, discuss definitions of fairness among different groups of people, while Mehrabi et al. [10] regard social historical reasons as sources of unfair bias. Hence, to narrow down the scope of this survey, we have concentrated on research in the field, considering only studies on representation bias—also known as data imbalance bias—which is the most commonly studied bias.

Additionally, there are some traditional image processing methods that can enforce fairness that have not been covered, both in the pre-processing phase, such as data sampling and re-weighting, or in the post-processing phase, such as parameter post tuning. As been discussed, these methods are inefficient compared with deep models and we do not include these methods to avoid repeating work already undertaken in other surveys. Our focus remains fixed on deep model methods. We also note that deep model studies from other fields, such as domain adaptation, could also be adopted for enforcing image fairness. These methods were included if they considered sensitive features, such as gender and race.

There have been several other surveys on fairness protection, each with their own distinct focus. For example, Mitchell et al. [3] concentrate on summarizing fair notions and metrics. Mehrabi et al. [10] focus on the sources of discrimination source. Quy et al. [25] examine the underlying relations in the attributes of fairness protection datasets. Caton and Haas [26] provide an overview of fairness protection from metrics to approaches to dilemmas in the machine learning area. As for fairness protection with deep models, Malik and Singh [27] discuss general deep learning technology, offering an introduction to unfair interpretation. Du et al. [28] present deep methods in terms of the bias found in inputs and representations, while Shi [29] looks at issues of unfairness in deep federated learning methods. Our work provides a thorough summary of image fairness protection with deep models. We present a comprehensive view of the problems, models, and challenges associated in this area. Unlike other surveys, we have analyzed the research trends in fairness protection for deep image models. We have also pinpointed three fundamental challenges to better fairness and discussed solutions drawn from other fields. Table 1 lists the different scopes of each of the fairness surveys.

Table 1 A comparison of different fairness protection surveys

In summary, this work contributes the following additions to the literature:

  • We have highlighted the difference between numerical and image inputs, summarizing the different problem settings for image inputs with deep models;

  • Research trends in the field are extracted and outlined;

  • The methods reviewed are classified into four approaches and compared against different fairness characteristics; and

  • Three fundamental challenges for better fairness protection have been identified with potential solutions introduced from other fields.

Figure 1 shows the main structure of this survey. The survey begins with background information on fairness protection in Sect. 2. Research trends with different problem settings are then discussed in Sect. 3. The deep model methods for protecting fairness are introduced in Sect. 4. Section 5 re-iterates the methods in terms of three challenges. Additionally, as fairness and privacy are closely related, we discuss this issue in Sect. 6. Future directions and conclusions are presented in Sect. 6.4.

Fig. 1
figure 1

The main structure of this survey: introducing image fairness protections with deep model methods from problems, models, and challenges

2 Background and preliminary

Before jumping into image fairness methods, we need to introduce the background and preliminaries of image fairness. This includes the relevant definitions, notions and measurements, datasets, and the methods associated with protecting fairness with images. We will also introduce the most widely adopted and fundamental deep models in image fairness protection studies.

2.1 Definitions

The most common case considered in fairness studies is a binary classification problem with data \(X \in R^n\), targets \(Y \in \{0, 1\}\) and sensitive attributes \(S \in \{0, 1\}\). The aim of enforcing fair predictions is to learn a model \(f : X \rightarrow Y\) whose predictions \(\hat{Y} \in \{0, 1\}\) are maximally close to Y while being fair for S under biased representations of training data.

Figure 2 illustrates a biased representation problem, also referred to as imbalance bias. In the image, the circles and squares are classification targets \(y_0\), \(y_1\), while the colors are sensitive features \(s_0\), \(s_1\). A well-trained and fair model should classify the shapes independently of the colors. However, the classifier favors blue samples over green ones due to short-cut learning issues and an over-representation of blue training samples. This spurious learning of features leads to unfair predictions. The most considered sensitive features in fairness studies are age, race, and gender.

Fig. 2
figure 2

Representation bias (imbalanced bias)

2.2 Fairness notions and measurements

Aside from the social, ethical, or philosophical debates on defining fairness [30], studies usually consider fairness notions in three respects: (1) individually, where similar examples should be treated similarly [31]; (2) causally, where sensitive features should be independent of the target predictions [32,33,34]; and (3) as a group where different subgroups of sensitive features should process similar outputs [31, 33, 35]. Group fairness notions can also be regarded as statistical fairness notions since they compare statistics, such as accuracy or false positive rates. For example, demographic parity (DP) [31, 33] compares positive predictions results across subgroups:

$$\begin{aligned} P_{\mathrm{s}_0}{\{\hat{Y}=1\}} = P_{\mathrm{s}_1}{\{\hat{Y}=1\}}, \end{aligned}$$

Equalized opportunity (EOP) [35] compares equal true positive rates across subgroups:

$$\begin{aligned} P_{\mathrm{s}_0}{\{\hat{Y}=1|Y=1\}} = P_{\mathrm{s}_1}{\{\hat{Y}=1|Y=1\}}, \end{aligned}$$

And equalized odds (EOD) [35] compares both true and false positive rates across subgroups:

$$\begin{aligned} P_{\mathrm{s}_0}{\{\hat{Y}=1|Y=i\}} = P_{\mathrm{s}_1}{\{\hat{Y}=1|Y=i\}}, i=0,1, \end{aligned}$$

Different fair notions need to be considered in different scenarios. For example, individual fairness requires that similar samples get similar treatments, which is suitable for general fair tasks. However, similarity should be defined for a particular task. This is generally challenging to define [36]. Causal fairness notions should be adopted when causal graphs are considered. One example is counterfactual fairness [33], which requires similar predictions between samples and their counterfactual counterparts. However, specifying either the counterparts or the causal relationships between datasets is challenging. Group fairness is most widely adopted since it involves explicit constraints.

Fair measurements adopt fair notions to quantify fairness results. For example, disparate impact (DI) adopts demographic parity to ensure fair measurements:

$$\begin{aligned} \hbox {DI}=\frac{P_{\mathrm{s}_0}{\{\hat{Y}=1\}}}{P_{\mathrm{s}_1}{\{\hat{Y}=1\}}}, \end{aligned}$$

As has been illustrated, disparate impact DI lies in the range \([0, \infty )\), where 1 denotes perfect demographic parity. A DI of \(< 1\) indicates that the classifier favors privileged groups and \(\hbox {DI} > 1\) means the opposite. Alternatively, subtractions with demographic parity can also be considered as measurements:

$$\begin{aligned} \Delta _\mathrm{DP}=|P_{\mathrm{s}_0}{\{\hat{Y}=1\}}-P_{\mathrm{s}_1}{\{\hat{Y}=1\}}|, \end{aligned}$$

The metrics true positive rate balance (TPRB) and true negative rate balance (TNRB) have also been adopted alongside the notions of equalized opportunity and equalized odds:

$$\begin{aligned}&\hbox {TPRB}=P_{\mathrm{s}_0}{\{\hat{Y}=1|Y=1\}}-P_{\mathrm{s}_0}{\{\hat{Y}=1|Y=1\}}, \\&\hbox {TNRB}=P_{\mathrm{s}_0}{\{\hat{Y}=1|Y=0\}}-P_{\mathrm{s}_0}{\{\hat{Y}=1|Y=0\}}, \\&\Delta _\mathrm{EP}= \frac{1}{2}\hbox {TPRB} + \frac{1}{2}\hbox {TNRB}, \end{aligned}$$

Note that these are by no means the only fairness notions and measurements. In their survey, Mehrabi et al. [10] introduce the most commonly used fair notions, while Islam et al. [37] summarize more than 20 different notions and measurements. A more detailed comparison of these different notions and measures and when they are used follows in Sect. 4.

2.3 Image fairness protection methods

Methods in fairness protection studies are generally categorized into three groups: (1) pre-processing methods such as sampling and re-weighting [38,39,40], where manipulations of training data are conducted before training; (2) in-processing methods [32, 41, 42], where the methods adopt additional losses for different fair metrics during the learning of the models; and (3) post-processing methods [43,44,45], where the methods tune the prediction results or adjust the decision boundaries to reduce unfair predictions.

Most surveys generally classify deep model methods as in-processing methods without much further discussion. However, since our focus is on deep models, we have divided the methods into four different approaches: fair constraint methods, data operation methods, methods that remove sensitive features, and methods that learn independent features. Each group is introduced in more detail in Sect. 4.

2.4 Datasets

Many of the studies on fairness rely on numerical datasets, such as Adult [46] and COMPAS [47]. The Adult dataset provides 48,842 records of people’s salaries including the attributes of race and gender. The COMPAS dataset scores a criminal defendant’s likelihood of re-offending (recidivism) with annotated attributions of over 10,000 samples.

The most widely-adopted image datasets for fairness studies are UTKFace [48] and CelebA [49]. UTKFace dataset consists of over 20,000 face images with annotations including age and ethnicity. The CelebA dataset provides more than 200K celebrity images with 40 attribute annotations. Generally speaking, age, gender, and race are commonly considered in image fairness studies.

Other datasets that have featured in papers on image fairness include the German credit dataset [46], the Diversity in Faces dataset [50], and the Yale B face dataset [51]. The Mehrabi et al. survey [10] also discusses some additional datasets. Table 2 provides a summary.

Table 2 The datasets most commonly adopted in fairness studies

2.5 Deep learning and fairness

Most deep models for image fairness protection models are adopting generative models [52, 53] with adversarial designs that generate latent features or synthetic data from the original training data [54, 55]. The most common implementations are variational autoencoders (VAEs) [56, 57] based models or generative adversarial networks (GANs) [58] based models. VAE models require distribution similarities between the generated data and the original training data through similarity constraints. GAN models generate data with additional adversarial model designs. To enforce fairness, deep models generate synthetic images with generative models and remove sensitive features with adversarial designs.

3 Problem settings for image fairness with deep model methods

This section begins with an analysis of the difference between numerical and image inputs in terms of fairness. This is followed by the various problem settings and contexts in which research on these methods has been presented. Our discussions cover both data inputs and deep learning technologies, and through these discussions, research trends in the field emerge.

3.1 Input differences with images

In terms of numerical inputs, sensitive features, such as gender, age, or race, are generally represented as discrete values or as a binary variable \(\{0,1\}\). Because it is possible to modify these sensitive features, fairness-aware methods such as data modification [59] or data generation [60] have emerged as a solution. However, with image data, feature disentanglement in high dimensional domains makes representing features with explicit values generally impossible. Figure 3 illustrates the difference. As described by Mo et al. [21], due to this disentanglement, general datasets suffer background bias and content bias, and constructing a balanced dataset that fairly considers all the different attributes is a challenging task [20]. What’s more, Xu et al. [22] observed that, even with balanced datasets containing equal training samples from a mixture of races, the trained model still suffered from unfair predictions. Another difference is that iterating sensitive feature is impossible because, unlike numeric data, images contain countless attributes. This can lead to difficulties with identifying bias among images [19, 61, 62] or collecting balanced image datasets [63,64,65].

Fig. 3
figure 3

Gender information in the Adult dataset is stored as a binary value [46], while, in the CelebA image dataset, gender information is not stored in a tractable form [49]

3.2 Problem settings

In this section, we answer the question: What is the research focus for different studies? The section starts with problem settings introduced by training inputs; the problem settings introduced by deep models then follows.

Generally, most image fairness studies focus on fairness-aware classifier training with imbalanced training datasets. Given sensitive features S and a learned model f, a fair learning objective can be expressed as:

$$\begin{aligned} \min l(Y,\hat{Y}) \quad \text {s.t.} \quad c(f(s_0,s_1)) < \theta , \end{aligned}$$
(1)

where l is the losses, c represents the adopted fairness measurements, and \(\theta\) represents a small number. The equation illustrates that the trained model should maintain low prediction losses while satisfying some fair measurement constraints with different fairness protection methods. Beyond these general settings, other problem settings have been considered. For example, studies [66,67,68] consider multiple sensitive attributes: \(S=\{S_1,S_2,...\}\), where the sensitive features can have multiple attribute values, such as race. With multiple values, it would be resource-intensive for the fair methods that construct additional sensitive prediction heads. Studies [69, 70] focus on universal fair representations without downstream tasks: \(l(I, \hat{I}), \quad c(g(s_0,s_1)) < \theta\), where g is a feature extraction model, and I and \(\hat{I}\) are input images and the generated synthetic images, respectively. This is challenging since it requires the learned features to be independent of the sensitive features. Study [71] consider noisy labels: \((\bar{Y},Y)\), where Y = \(\bar{Y}\) + \(\epsilon\), \(\bar{Y}\) represent the real labels, while Y denotes the labels in the dataset that contain noise. Grari et al. [72] examined continuous sensitive values where \(s \in R\). This setting makes the fair methods with sensitive prediction heads invalid since considering prediction heads for continuous values are difficult. Others have concentrated on discovering bias among datasets where the sensitive features S are unknown [61, 62, 73].

Turning to the problem settings introduced by deep models, some studies observed unfair phenomenons in deep models: Choi et al. [69] find that generation models amplify data bias, which leads to unfair image generation. Li et al. [74] consider fair issues in deep clustering methods; Xu et al. [75] observe unfair results in adversarial training, while Chen et al. [76] look at fairness in graph deep models, and Shi et al. [29] summarize image fairness protection methods for deep federated learning models.

One issue that repeatedly crops up in studies on fairness protection concerns model utility once fairness constraints have been enforced. Chang et al. [77], for example, verify that fair models are more vulnerable to adversarial attacks. Mishler and Kennedy investigate the balance between accuracy and fairness [78]. Qian et al. [79] examine fairness methods in the context of deep learning, finding that fair protected models may suffer large fair variance.

Moreover, some studies report that fairness protection issues share connections with issues in other fields. For instance, Zhao et al. [80] cast incremental learning problems to fair protection problems, while Wei et al. [81] explore fairness protection methods for long-tailed data problems. Table 3 summarizes the different problem settings with the corresponding approach.

3.2.1 Discussion

Although images as an input bring certain challenges to fairness protection, they share similar problem settings with numerical studies, such as considering multiple or continuous sensitive attribute values. Similarly, some of the difficulties raised by deep models have also been seen in previous studies, such as fairness for clustering or online learning. However, there are also emerging problems that are being addressed for the first time in the context of image fairness, such as fairness for image generation models, federated models, and graph models.

Table 3 Different problems settings for image fairness protections with deep models

4 Models of image fairness protections with deep models

While previous studies on fairness protection generally classify deep methods as in-processing methods, we have divided them into four further groups: fair constraint methods, data generation methods, methods that remove sensitive features, and methods that learn independent features. This section introduces each approach in detail and concludes with a comparison of the various characteristics of the methods, including the metrics and datasets used and the sensitive features considered.

4.1 Fair constraints

Fair constraint methods incorporate additional loss constraints and learning objects into the learning procedure in such a way that the learned models should satisfy the corresponding fair metrics. Examples of this method can be found in [75, 77, 82,83,84] where the learning problems are solved following Eq. 1. This equation can be seen as an optimization issue with constraints. As with other mathematical modeling studies [85,86,87], the problem can be resolved with Lagrangian methods. Given some sensitive features S and a learning model f with weights w, Eq. 1 can be expressed as:

$$\begin{aligned} \min _{w}\max _{\lambda \in R_{+}}{l(Y, f(I,w))+\lambda c(w, S)}, \end{aligned}$$

where \(\lambda\) is the Lagrange multiplier. The maximum lower bound can be replaced considering \(\lambda _{\max }\):

$$\begin{aligned} \max _{\lambda \in R_{+}} \min _{w}l(Y, f(I,w))+\lambda c(w, S). \end{aligned}$$

The general method of deriving \(\lambda _{\max }\) is to update the weights twice during learning, once to minimize the loss w.r.t. w and again to maximize the loss w.r.t. \(\lambda\) at every iteration [88]. Although these methods all rely on constraints, they are designed for different fair metrics and settings. More details on this are presented in Table 5. Notably, while fair constraint methods enforce fairness protection, they do introduce target irrelevant learning objects. To avoid the problem, data operation methods are proposed for better fairness protections. These are discussed next.

4.2 Data operations

Intuitively, there are traditional image processing methods that can enforce fairness. However, studies have shown that general image processing methods are inefficient compared to deep model methods [19]. Thus, some deep models have incorporated techniques like data generation, sampling, and re-weighting to enhance performance.

4.2.1 Data generation

Data generation methods generate synthetic training samples or features with adversarial models to balance the dataset. Hwang et al. [89], for example, generated under-represented samples with CycleGAN [90], while Joo et al. [65, 91] generated synthetic samples through latent feature manipulations with GAN inversion methods [92]. Alternatively, others have reconstructed the datasets through data argumentation methods [82, 93]. With mixed-up images comprised of different subgroups, models tend to learn fair features, like Du et al. [94] for instance, who presented a mix-up scheme to generate neutralized features.

4.2.2 Data sampling

Data sampling methods strike a balance in the training samples during training iterations. In this vein, Roh et al. [30] designed a learning scheme for batch sample selections to balance any prediction gap between subgroups. Another technique is to design deep methods to optimize the sample selection procedure [95, 96]. Shekar et al. [97] enforced fairness through sampling with hard example mining methods.

4.2.3 Data re-weighting

Re-weighting methods introduce parameter weights to balance samples or learned features with the model’s design. Zhao et al. [80] reduced biased predictions by attaching one fully connected layer to a classifier, called a weight aligning layer, to re-assign weights across groups. Gong et al. [98] designed different convolution kernels for different attribute values, then re-fused the values to balance feature learning and enforce fairer predictions.

However, while data operations are good for balancing datasets, they do not prevent sensitive features from being learned during training. To overcome this problem, the sensitive features need to be eliminated using a removal method as discussed next.

4.3 Removing sensitive features

Zemel et al. [99] were the first to cast fairness problems as an issue of removing sensitive features from numerical input data. Ever since, researchers have been crafting new ways to remove sensitive features. Some are designed for numerical inputs [38, 100]. Others are designed for image inputs under a range of situations [54, 55, 64, 66, 67, 74, 101,102,103,104]. However, despite their subtle differences, all these methods follow the same broad approach. Figure 4 illustrates the general model structure. It is an adversarial framework that learns the target task without the ability to predict sensitive information. Specifically, an encoder first extracts latent information as a proxy for the input. Then, one task prediction head and one sensitive attribute prediction head are attached to predict the learning tasks and the attribute values. To ensure the extracted features contain little to no sensitive information, an inverse gradient is updated for sensitive attribute predictions during the learning. With h representing the extracted features, f and \(f_\mathrm{s}\) being the task prediction heads and sensitive features prediction heads, respectively, the learning objects can be expressed as:

$$\begin{aligned} \min _{l} \max _{l_\mathrm{a}} E[l(Y, f(h) - l_{\mathrm{s}}(S, f_{\mathrm{s}}(h)], \end{aligned}$$

where l and \(l_\mathrm{s}\) are the training losses of the target and sensitive attribute predictions. The learning adopts an adversarial training setting by minimizing the target classification losses and maximizing the attribute prediction losses.

Fig. 4
figure 4

Adversarial framework with inverse gradient for the methods that remove sensitive features

It is worth noting that, although removing sensitive features enforces fairness, removing features can deteriorate prediction performance. Therefore, the last type of scheme tries to simply disentangle sensitive features without tending to remove them.

4.4 Learning independent features

Methods of learning independent features try to enforce fairness by guaranteeing that the features adopted for task predictions do not contain sensitive information—that is, that the prediction features are independent of the sensitive features. Generally, deep models predict tasks, sensitive attribute values, and reconstruct input data at the same time. A range of similarity measurements between a task’s features and its sensitive features have been put forward to enforce independence. The idea is to minimize the similarity between learned features and sensitive features during training. Figure 5 presents the general framework. With the learned features for the tasks \(h_\mathrm{t}\), the sensitive features \(h_\mathrm{s}\), and the reconstructed images \(\hat{I}\), the loss functions can be described as:

$$\begin{aligned} L = \lambda _1 l_\mathrm{tar}+\lambda _2 l_\mathrm{recon} (I, \hat{I})+\lambda _3 l_\mathrm{simi}(h_\mathrm{t}, h_\mathrm{s}), \end{aligned}$$

where \(l_\mathrm{tar}\) is the target loss, \(l_\mathrm{recon}\) is the reconstruction loss which maintains the utility of generated data, and \(l_\mathrm{simi}\) measures the similarity distance between features.

The studies that have adopted this method in our review include [53, 63, 67, 68, 73, 105,106,107,108,109,110,111, 111,112,113,114,115,116,117]. The different similarity measurements that have been used include maximum mean discrepancy (MMD) [105] in [105,106,107], the Kullback–Leibler divergence (KL) [68, 73, 108,109,110,111], the Hilbert–Schmidt independence criterion (HSIC) [53, 112, 113], matrix correlations [114], cosine similarity [63, 115, 116], and L1 and Euclidean distance [67, 117]. Additionally, Gitiaux and Rangwala [111] designed a binary representation method to better learn independent features. Table 4 provides a detailed summary of the main methods referred to in this section.

Fig. 5
figure 5

General model structure for the independent feature learning methods

4.4.1 Other studies

Apart from deep model designs, there are other technologies for enforcing image fairness protection. For example, Wang and Deng [118] learn adaptive classification margins for different subgroups with deep reinforcement learning methods (DQN [119]), while Kim et al. [120] adopt boosting methods to promote fairness.

Table 4 A comparison of deep learning image fairness models

4.4.2 Discussion

Although above methods are widely adopted for fairness-aware training, there are limitations for each direction. Using fair constraints usually results in an accuracy drop, often referred to as the utility issue, which has been reported in several studies such as [35, 121, 123, 124]. Moreover, deep model optimizations are often non-convex in nature and challenging to solve in general, which may lead to difficult or unstable training [26, 125].

With the data generation techniques, generating under-represented images or counterfactual samples for model training is difficult as disentangling features and interpreting high dimensional inputs is always challenging [92]. Additionally, Vinyals [126, 127] show that, even generating highly photorealistic images, e.g., with model BigGAN [128], training a classifier on synthetic images is never as good as training with real ones.

Removing sensitive features and learning independent features both involve representation learning, which raises similar concerns over utility [19, 110, 117]. It also creates difficulties when attempting to learn invariant or independent features [102]. In their survey, Caton and Haas [26] discuss the limitations of fair representation learning studies in detail.

Despite all the different directions pursued, each aims to break spurious correlations between learning targets and sensitive features. Fair constraint methods encourage models to learn similar predictions across subgroups of sensitive features, which leaves the predictions and sensitive features irrelevant. Data operation methods balance the datasets and reweight the feature maps to impede the model’s short-cut learning. Sensitive feature removal prevents the models from extracting sensitive features through deep adversarial designs. And learning independent features encourages the model predictions to be independent of the sensitive features through various similarity measurements. Nevertheless, all four methods aim to generate fair predictions by not learning spurious correlations.

4.4.3 Fairness characteristics

Having discussed the methods themselves, it is now important to cover the measurements, datasets, and attributes considered for a more detailed comparison between the methods. Table 5 shows a matrix. The table illustrates that most studies concentrate on a single fairness metric, while strategies that involve multiple metrics have drawn less attention. Additionally, there are fewer studies on individual or counterfactual fairness since the metrics required are stricter than for others. Another issue that has been raised is that the frameworks are not particularly generalized. A method designed for one fair metric may not be applicable to others [79]. Hence, general frameworks for fairness are still necessary. As a last observation, most of the studies concern data with a single sensitive feature, such as gender or race. Problems involving multiple sensitive features or features with non-binary or continuous values are far less studied.

Table 5 A comparison of the main fair characteristics

In real applications, given limited medical data available for training, Frid-Adar et al. [129] opted to generate their own data and achieved outstanding fairness protection results. After comparing different fairness protection methods, Qian et al. [79] found that fair constraint methods were suitable for fairness protection under various fair metrics since the constraints are explicitly attached to the loss functions. Wang et al. [19] find that, in general, representation methods such as removing sensitive features and learning independent features can enforce fairness protection. However, learning fair representations is difficult.

Overall, despite the different methods, the current studies have only eased the unfair prediction issue, not solved it. Fixing all the problems with fairness protection still holds many challenges. This is the subject of the next section. However, it is notable that although the fair results may just be improved 2% in the studies, the studies are working towards solving the issue and better understanding the dynamic learning procedures of deep models.

5 Challenges to ensuring greater fairness

This section discusses the challenges underlying the above methods. From our review, we settled on three main questions that need to be answered before fairness protection can become a largely resolved issue: (1) How can we learn fairer representations? (2) How can we maintain utility in the face of data modifications (data operation methods) or additional training objects (fair constraints)? And (3) Since most studies concern public datasets, how can we reveal imbalance bias in real data and real-world situations? Each of these questions is discussed in more detail next.

5.1 How to get fairer representations?

The critical issue for the methods that involve removing sensitive features or learning independent features with similarity constraints is to learn fair representations. Although previous studies have proposed solutions like gradient reversing methods or learning independent features, solutions from other fields experiencing similar problems may also be helpful. We reviewed some of the literature on image causality inference, domain adaptation (invariant representation learning), and transfer learning and found several methods worthy of discussion that could be adopted to improve fair representation learning.

5.1.1 Image causality inference

Image causality inference is an emerging topic. The idea behind this notion is to empower learning models with the ability to deal with causal effects; they can either remove the spurious bias [134], disentangle the desired model effects [135], or modularize reusable features that generalize well [136]. The main approach involves interventions with inputs or features. At the same time, similar predictions are required for original and modified inputs. As a result, causal chains are broken down for skew attributes and targets. Given the original and counterfactual samples x and \(\hat{x}\) with a learning target of Y, the consistency rule is formulated as:

$$\begin{aligned} P(Y|x) = P(Y|\hat{x}), \end{aligned}$$

where P are the prediction probabilities. Most studies construct a contrastive loss to ensure similar predictions. For the training pair x and \(\hat{x}\) with a learning model f, the contrastive loss can be expressed as:

$$\begin{aligned} L_\mathrm{con}=-\log E\left[ \frac{\exp ({f(x)^T} {f(\hat{x}) /\tau } )}{\sum _{i} \exp (f(x_i)^T f(x_j)/\tau )}\right] , \end{aligned}$$

where \(x_i\) and \(x_j\) are samples from different classes and \(\tau\) is the scalar temperature hyper-parameter [137].

The connection between causality and fairness has been shown in several studies. Kusner et al. introduced counterfactual fairness, which requires that samples and their counterfactual counterparts should share similar predictions [33]. Ramaswany et al. and Yurochkin et al. generated synthetic images by manipulating latent vectors as counterfactual samples [36, 65]. Sarhan et al. treated sensitive features S and learning targets Y as causally independent features through learning orthogonality values of mean vectors for target and sensitive features distributions [68].

5.1.2 Domain adaptation

Domain adaptation, also known as invariant representation learning, refers to methods that attempt to learn invariant features across domains for the purposes of model generalization improvement. The invariant learning serves as a proxy for causality inference [138], it refers to finding features that are domain-invariant, i.e, that reliably predict the true class regardless of the domain environment [139]. As domains can be viewed as sensitive group features in fairness protection, fairness problems can be cast as an issue of learning invariant features. Invariant risk minimization [140] is one of several recently successful approaches in the field that minimizes predicting distances across domains. Given the domains \(\forall e_{1}, e_{2} \in E\) through and the learning features \(\forall h \in H\), the aim of these studies can be expressed as:

$$\begin{aligned} E [y \mid g(x)=h, e_{1}]=E[y \mid g(x)=h, e_{2}], \end{aligned}$$

where g are the feature extraction models.

Requiring invariant features and consistency predictions across domains can be interpreted as a group fairness metric, such as demographic fairness. Similar interpretations have been discussed in [141] which enumerates several group fairness criteria and draws analogies to domain generalization methods. For fairness protection, Adragna et al. empirically demonstrate that domain adaptation methods can be used to enforce fairness protection through learning models that are invariant to the features containing sensitive attributes [142]. They adopted invariant risk minimization to encourage models to learn invariant predictors for different sensitive subgroups. Although they considered only textual inputs for comment toxicity classification tasks, in principle, the proposed methods could be applied to tasks with images.

5.1.3 Transfer learning

Some image fairness methods enforce fairness by requiring similar feature distributions across subgroups [68, 73, 108, 109]. Although the problem may not be the same as transfer learning, which requires transferring the knowledge across domains, they share similar methods such as MMD in [106, 107], KL in [73], or HSIC constraint in [53]. We think the merging methods in transfer learning can be further adapted to encourage better fairness protection.

5.1.4 Other fields

There are also other fields that closely relate to fairness protection, such as: out-of-distribution detection (OOD) [143], which distinguishes minorities based on feature differences; GAN inversion [65], which inverts images into disentangled latent space features; contrastive learning [144], which requires consistent predictions for training pairs; incremental learning [80], which manipulates the learning features for different classes. Such methods might also be helpful for encouraging fair representation learning and fairness protection.

5.2 How to maintain prediction performance after enforcing fairness?

Fairness methods may modify training samples or introduce target-irrelevant constraints. Naturally, this raises concerns about whether the applying methods will cause the model’s performance to deteriorate. Most studies on enforcing fairness in deep models witness accuracy drops after fairness protection has been applied [121, 123]. A similar phenomenon has also been observed in traditional machine learning studies [35, 124], so this comes as no surprise. Further, Chang et al. [77] observed that fair models are more vulnerable to adversarial attacks than their original counterparts. Van et al. [145] discuss adversarial defenses for fair models. Qian et al. [79] illustrate that learned models tend to have larger fair variance after fairness enforcement. In other words, they have utility problems. Few studies have focused on this problem, so this is a future challenge still to be met.

5.2.1 Reasons

Currently, no closed studies are available to illustrate the reason for utility issues. The issue may be caused by removing sensitive features, which may actually remove the features related to the targets [40] for methods that remove sensitive features or it could be due to any target irrelevant constraints, such as the additional losses introduced in fair constraint methods.

5.2.2 Methods

To maintain utility, one promising direction is to maintain the similarities between original and learned data. Calmon et al. [38] introduced a utility preservation constraint to guarantee that the distributions between the original data and the latent space features remained statistically close. Specifically, they adopted KL-divergence to measure the distances between two distributions. Zhang et al. measured the same distance but in Euclidean terms [100], while Xu et al. use dimension-wise probability to check whether the modified data maintains a similar distribution [52]. Beyond distribution similarities, Quadrianto et al. adopt image reconstruction losses to ensure similar semantic meanings for generated images [53].

Previous studies concentrate on maintaining similarity to guarantee the utility of fairness protection methods. However, proposing a general and practical metric to measure utility is still challenging, especially for image modifications. Additionally, the introduced methods have only been applied to datasets with limited samples under specific settings. Thus, the methods’ performance could be positively related to factors such as the size of data, the sparsity of data, the data that is modified, or the specific machine learning algorithm used. Further study on the utility impact issue is still necessary.

5.3 How to find imbalanced biases among data

Since most studies on fairness protection rely on public datasets or synthetically generated data with given sensitive feature annotations, more work is needed to determine how we can effectively discover bias in real-world data. Hu et al. introduced the one-human-in-the-loop method to find bias [62]. They designed questions to ask people whether images contained sensitive information, revealing sensitive attributes based on the statistics. Amini et al. discovered under-represented data based on the latent representations [73]. They distinguished feature representations between majority and minority data, which is challenging with image data. Li et al. [61] combined the methods and tried to find biased features through GAN and human-in-the-loop methods. They generated synthetic images with GAN methods based on the latent vectors and asked people to interpret the semantic meaning of images.

As few studies focus on finding bias, considering the high cost of mass data annotations, more methods to discover bias in datasets are necessary.

6 From fairness to privacy

6.1 Fairness protection and privacy protection share relationships

Dwork et al. introduced the notion of individual fairness, which requires treating similar people similarly [31]. The idea can be seen as a generalization of differential privacy [146]. As described in [99], differential privacy involves a constraint for the rows of a data matrix, while fairness involves a constraint for the columns. They share tide relations. Inspired by the similarity between the two, Dwork et al. [31] proposed a fairness protection method based on differential privacy that imposes a Lipschitz constraint on fair metrics. Likewise, fairness protection methods have also been considered for privacy protection issues [147].

6.2 With machine learning

Ekstrand et al. first raised the question of whether statistical metrics of predictive results, such as equalized odds, were compatible with privacy [148]. They showed that the constraint of differential privacy might lead to fairness under certain conditions. Later, Jagielski et al. proposed two algorithms that can satisfy both differential privacy and equalized odds. Similarly, Xu et al. also achieved differential privacy and demographic parity at the same time [149]. Other studies include [150] and [151] with K-anonymity, and [152] and [153] with data mining.

6.3 With deep learning

New challenges have emerged from the studies that focus on fairness protection in deep learning. Xu et al. and Bagdasaryan et al., for instance, find that privacy protection with stochastic gradient descent may lead to unfair results [123, 154]. This shows that achieving fairness protection and differential privacy at the same time is quite necessary. As for fair representation, Grgic-Hlaca et al. [24] regard the sensitive features of fairness as private information and so proposed methods that fit both fairness and privacy. Edwards et al. [54] use learned representations to hide private information in the image. They argue that, in this way, image privacy and fairness can be achieved at the same time.

6.4 Future directions and conclusions

There are several outstanding challenges in the image fairness literature that have yet to be addressed. In Sect. 3, we indicated that future trends in this field may fall into: (1) exploring more and different settings, such as multiple [66] and continuous [72] sensitive features; and (2) examining some newly emerging deep learning applications, such as deep clustering [74], adversarial training [75], and attacks [77].

Given the models and methods presented in Sects. 4 and 5, more studies from related study fields could be taken on board to ensure fairer representations. Some solutions already seem very promising, such as causality inference [65], domain generalization or invariant representation learning [142], and transfer learning [103].

In terms of the model utility concerns presented in Sect. 5, a systematic understanding of how data and feature modifications contribute to predictions is still lacking. We believe the causal-based [155, 156] methods or those based on interpretation [157] could be possible solutions. Moreover, as little attention has been paid to discovering sensitive features, further research is required to discover and represent them in real-world datasets [24].

Additionally, fairness protection and privacy protection have a great many overlaps. Yet, few methods have been proposed to achieve both [123, 154, 158]. Further research should be undertaken to explore methods that can bestow both types of protection, especially for settings with high dimensional inputs.

In this paper, we summarized deep learning based image fairness protection studies in three respects: problems, models, and challenges. Since image inputs are different from numerical inputs, we started by highlighting the differences and summarizing the research trends by presenting different problem settings. We then introduced four approaches to fairness with deep model methods and their characteristics. Additionally, we discussed the three main challenges leading to better fairness protection results. Last, we discussed the closeness between fairness and privacy as issues.

Although our focus remained solely fixed on image fairness studies in the realm of deep learning, we did introduce some problems and challenges that can extend to fairness with numerical inputs and other high dimensional data tasks, such as natural language processing, speech processing, or video processing. Problems with data bias are also common in the image processing area. The discussed studies shared the same problems with other fields such as domain adaptation, transfer learning, or long-tail issues, as all aim to break spurious correlations and learn invariant features. We expect that the methods might stimulate cross-pollination among these fields.

Our survey concludes with some comparisons between fairness and privacy preservation in terms of problems and questions. Fairness protection methods can be aligned with accuracy and privacy preservation. This leaves room for further work to summarize fairness protection studies from a range of additional perspectives.