1 Introduction

Machine learning (ML)-based systems are becoming increasingly ubiquitous in today’s world, with their applications ranging from small embedded devices [like health monitoring in smartwatches (Esteva et al., 2019)] to large safety-critical systems [like autonomous driving (Fink et al., 2019)]. Their success is often attributed to the Neural Networks (NNs) deployed in these systems, which have the ability to learn and perform decision-making with a high accuracy, without being explicitly programmed for their designated task. Typically, these NNs are trained on large datasets, with tens to hundreds of thousands of input samples, using various supervised training algorithms. Testing accuracy is often the most commonly (and possibly the only) used metric to analyze the performance of these NNs.

This spotlights two major limitations: (a) there is a notable reliance on large, labeled datasets, obtaining which is a significant challenge for the ML community, especially for new use-cases, and (b) the trained NN may experience problems like robustness bias, i.e., the robustness of NN to noise is not the same across all output classes, which accentuate in the presence of noisy real-world data.

Even when large datasets are available, they may contain a significantly large number of samples from one output/decision class. For instance, the MIT-BIH Arrhythmia dataset (Moody and Mark, 2001) contains a considerably larger number of normal ECG signals as compared to the ECG signals indicating a specific arrhythmia. Likewise, the IMDB-WIKI dataset (Rothe et al., 2018) comprises mostly of Caucasian faces. The NNs trained on such datasets are, therefore, less likely to detect arrhythmia or non-Caucasian faces, with high confidence—the problem aggravates under noisy input setting. However, the number of inputs from each output class is not the only parameter that leads to an imbalanced dataset.

1.1 Motivating example

Consider a NN trained on the Leukemia dataset (Golub et al., 1999)—details of the dataset and NN are provided in Sect. 5 along with further experiments. The training dataset contains an unequal number of inputs from the two output classes. Figure 1 (left) shows the classification performance of this network under the application of varying noise. Not surprisingly, the trained NN is more likely to misclassify inputs from the output class with less number of training inputs.

The experiments were then repeated, deleting randomly selected inputs from the class with a larger number of inputs in the training dataset each time, hence ensuring an equal number of inputs from both classes in the dataset. The graphs in Fig. 1 (right) give the classification performance of these networks under the application of varying noise. As shown in the graphs, simply having an equal number of inputs in both classes may still lead to a trained network significantly misclassifying inputs from one class.

Fig. 1
figure 1

Networks trained on unequal (left) and equal (right) number of inputs from the classes: Label 0 and Label 1. All networks used the same network architecture and training hyper-parameters, and all indicate a higher likelihood of Label 0 being misclassified as compared to Label 1

It must also be noted that the bias becomes apparent only in the presence of noise, since the trained NNs do not indicate misclassifications in the absence of noise. Hence, the robustness bias in a trained NN may go undetected before the deployment of the NN in a real-world application. This gravitates the need to address robustness bias and calls for the better description and acquisition of balanced datasets that may enable training unbiased NNs. However, obtaining such datasets is not a straightforward task.

The existing works dealing with bias alleviation either aim to improve the training algorithms to ensure unbiased training, or manipulate training data to obtain datasets that favor minimal NN bias. Yet, most of these works (Gat et al., 2020; Le Bras et al., 2020; Nam et al., 2020) encounter the following limitations, making robustness bias alleviation a challenging task:

  1. 1.

    Most works (Li & Vasconcelos, 2019; Li et al., 2018; Zhao et al., 2017) focus on either the dataset bias, i.e., the lack of generalization of the available dataset to real-world data, or representation bias, i.e., flaws in the dataset acquired during its collection process. However, they rarely focus on biases like robustness bias, which generally becomes evident only during NN deployment, since noisy inputs are common in practical real-world systems.

  2. 2.

    A limited notion of balanced dataset is often used in literature (Bagui & Li, 2021; Lemaître et al., 2017), i.e., a balanced dataset is the one that contains an equal number of inputs from all output classes. However, as seen from our motivational example, such a dataset does not necessarily aid in the alleviation of robustness bias.

  3. 3.

    They primarily focus on large datasets (Nam et al., 2020; Kim et al., 2019; Le Bras et al., 2020; Gat et al., 2020; Zhang et al., 2019), which provide a large pool of training samples to learn the input features from as well as to handpick a subset of inputs that favor an unbiased NN. However, such large datasets may not always be available.

  4. 4.

    Some works focus on adding new input samples to the training dataset or at deeper network layers (Zhang et al., 2019). However, the heuristics for adding new inputs do not always favor a balanced dataset.

  5. 5.

    The addition and deletion of input samples (Bagui & Li, 2021) may also lead to overfitting or reduction of the training dataset, respectively.

  6. 6.

    The works also often focus on visual datasets, like colored MNIST or the IMDB dataset, where the existence of bias is perceptually easy to detect and comprehend (Wang et al., 2020; Zhao et al., 2017). However, the robustness bias problem may stretch beyond visual datasets, albeit often being difficult to (perceptually) detect in non-visual datasets.

1.2 Our novel contributions

To address the aforementioned limitations and challenges, this paper proposes the UnbiasedNets framework,Footnote 1 which facilitates the detection and reduction (ideally elimination) of bias in a trained NN by addressing the bias at the root level, i.e., by reducing the bias within the training data, rather than relying on training algorithms to unlearn biases. Our framework is generic and hence can be implemented along with any training algorithm, using any programming language (including MATLAB, Python, C++, etc.). The novel contributions of the work are as follows:

  1. 1.

    This work deals with robustness bias, which results from having an imbalanced dataset (which may in turn be a consequence of either dataset bias or representation bias or both), to alleviate bias from datasets where the bias may not always be apparent in the absence of noisy inputs.

  2. 2.

    We redefine the notion of balanced dataset to provide a more precise explanation of the extent to which the number of inputs from each output class is, or is not, essential for training unbiased NNs.

  3. 3.

    Unlike the state-of-the-art approaches, UnbiasedNets can work efficiently to diversify the dataset even in the absence of a large dataset using K-means clustering and the noise tolerance of a NN previously trained on the dataset.

  4. 4.

    Our novel framework can identify the practical bounds for generating synthetic input samples using clusters of input features obtained via K-means and the noise tolerance bounds of the trained network. To the best of our knowledge, UnbiasedNets is the only framework exploiting noise tolerance to obtain realistic bounds for synthetic inputs. We also make use of feature correlation from real-world inputs to ensure that the synthesized inputs are realistic.

  5. 5.

    UnbiasedNets combines synthetic input generation with redundancy minimization to diversify and generate potentially balanced and equally-represented datasets, with not necessarily an equal number of inputs from all output classes.

  6. 6.

    The framework is applicable in diverse application scenarios. We demonstrate this using UnbiasedNets on two real-world datasets, where the bias in the dataset is not always visually detectable, and hence may not be straightforward to address.

1.3 Paper organization

The rest of the paper is organized as follows. Section 2 gives an overview of the existing works for bias alleviation in NNs. Section 3 elaborates on the notions of balanced datasets, robustness, robustness bias, metric for bias estimation and noise tolerance, while also providing the relevant formalism. Section 4 then explains our novel data diversification framework, UnbiasedNets, to alleviate robustness bias from the training dataset. Sections 5 and 6 show the application of UnbiasedNets on real-world datasets, providing details of experiments, results, and analysis. Section 7 discusses the open future directions for the improvements in data diversification for alleviating robustness bias. Finally, Sect. 8 concludes the paper.

2 Related work

This section provides an overview of the current state-of-the-art on reducing bias in NNs. The summary of state-of-the-art, including approach categorization, their predominant focus on non-visual datasets, and their comparison to our novel UnbiasedNets approach, is given in Table 1. The bias alleviation approaches can be broadly classified into two major categories: (1) unbiased training algorithms (i.e., algorithm-centric (AC) approaches), and (2) bias reduction via dataset manipulation (i.e., data-centric (DC) approaches). Towards the end of the section, we also provide an overview of the current and on-going works targeting the recently discovered problem of robustness bias.

Table 1 Comparison of the state-of-the-art bias alleviation approaches with our proposed UnbiasedNets framework

2.1 Algorithm-Centric (AC) approaches

Training unbiased NN via AC approaches often involves splitting the network model into two separate but connected networks (Alvi et al., 2018; Kim et al., 2019; Nam et al., 2020). The first network aims at either identifying key input features or amplifying the bias present in the dataset. The second network, in turn, uses these features or accentuated bias to unlearn the bias from the network. Learning features at deeper NN layers during training for data augmentation (Zhang et al., 2019) has also been shown to aid unbiased training. In addition, knowledge of known biases in the dataset and a NN trained using standard cross-entropy loss has also been leveraged to develop a more robust NN (Sanh et al., 2020). Other AC bias reduction approaches include the incorporation of additional constraints during training to guide the NN in order to avoid learning unwanted correlations in data (Zhao et al., 2017).

For biases specific to multi-modal datasets (like colored MNIST (Kim et al., 2019), where the dataset contains two kinds of information: the colors and the numerals), the use of a training algorithm based on functional entropy is shown to perform better (Gat et al., 2020). A recent work (Li & Vasconcelos, 2019) also explores inputs in the dataset to identify the weightsFootnote 2 that the inputs must be encoded with before training, to successfully reduce the bias. The determination of invariants in inputs has also been proposed (Arjovsky et al., 2019) to enable unbiased training of a NN. In addition, recent work (Savani et al., 2020) also explores algorithms where instead of training an unbiased network from scratch, a trained NN and dataset (not used during training) are used to fine-tune the network to be devoid of biases specific to a certain application.

However, as indicated earlier, these works are tailored for minimizing data and representation biases, generally for large datasets. The biases are often explored in visual datasets. In contrast, NNs deployed in the real-world often also deal with non-visual inputs, like patient’s medical data, where the existence of a bias (even the data and representation biases) may not always be easy to detect and hence may go unnoticed. Hence, bias alleviation poses a challenge in cases where the detection of bias is beyond visual perception. Moreover, the exploration of robustness bias is a fairly new research direction, and hence, the success of these AC approaches for minimizing robustness bias remains largely unexplored.

2.2 Data-centric (DC) approaches

The orthogonal direction to minimize bias is by manipulating the training dataset via DC approaches, to potentially eliminate the bias at its core. Among the simplest and most popular DC bias alleviation approaches are random over-sampling (ROS), i.e., random replication of inputs from the class with less number of input samples, or random under-sampling (RUS), i.e., random deletion of inputs from the class with a significantly larger portion of available inputs (Bagui & Li, 2021; Leevy et al., 2018). The idea is to obtain a dataset with an equal number of inputs from each class. However, RUS is known to reduce the number of input samples available for NN to learn, while ROS may lead to overfitting the training data.

The synthetic minority over-sampling (SMOTE) (Chawla et al., 2002) and adaptive synthetic sampling (ADASYN) (He et al., 2008) techniques provide an improvement over ROS by synthesizing new points in the class with less number of samples using the available inputs as reference for the synthesis of new input samples (Lemaître et al., 2017). However, the general assumption in these works is that having an equal number of inputs for each of the classes ensures a balanced dataset, and in turn ensures an absence of bias (Bagui & Li, 2021; Picek et al., 2019). As such, the approaches deploy data manipulation for the output class with a smaller number of inputs only. As observed in the motivating example in Sect. 1, this assumption provides a limited notion of balanced datasets. In addition, neither do these works have the means to ensure if the new inputs generated in fact belong to the minority class (i.e., output class with less number of inputs), nor the sophistication to analyze the number of inputs required to be added to the class to alleviate bias.

Other works explore heuristics to identify the inputs that must be removed from the training dataset (Le Bras et al., 2020; Li et al., 2018) for obtaining an unbiased NN. However, for most real-world applications, large labeled datasets may not always be available, except to a few tech giants. This leaves limited scope for tasks relying on limited dataset for bias alleviation.

In summary, the DC approaches again focus on alleviating representation and data bias, i.e., the biases pertaining to faulty data acquisition and lack of data generalizing well to all output classes. Alleviation of robustness bias remains an unexplored research direction in the existing works. The notion of a balanced dataset often used in these works is too naive. For the approaches relying on the deletion of inputs from the training dataset, the approaches are ideal only for large datasets to ensure sufficient inputs remain for NN training. For the augmentation approaches (like ROS, SMOTE and ADASYN), i.e., the approaches where synthetic inputs are added to the training dataset (henceforth referred to as data augmentation), the location for the new inputs is chosen to be in the close proximity around existing “randomly” selected inputs. The new inputs may or may not be realistic for the real-world input domain. The validation of these generated synthetic inputs relies solely on them being a part of NN training, and how well the trained NN works with the testing dataset.

2.2.1 Bias and the focus on visual datasets

As highlighted in Sect. 1, NNs are deployed in a diverse range of applications. These include networks performing classification and decision-making tasks for visual inputs (Vu et al., 2022; Li et al., 2021). Yet, a large portion of NN applications, for instance, banking (Asha & KR, 2021), environmental forecast (Benali et al., 2019), finance (Calvo-Pardo et al., 2020) and spam filtering (Barushka and Hajek, 2018), accept non-visual inputs. However, most literature pertaining to bias analysis (Alvi et al., 2018; Gat et al., 2020; Kim et al., 2019; Nam et al., 2020; Li et al., 2018; Li & Vasconcelos, 2019; Zhang et al., 2019; Zhao et al., 2017) focus (often solely) on NNs working on visual datasets—this comes to no surprise since a bias in these datasets is visually perceptible to human analysts, who are inclined to perceive visual queues better than the non-visual ones (for instance, consider the case of visual capture, where visual senses are observed to dominate over auditory senses (Welch, 1999)).

The NNs using non-visual inputs often deploy similar network architectures as those using visual inputs. Intuitively, these NNs are likely to be as biased as their counterparts used in visual applications. Yet, the difficulty in perceiving the bias in non-visual datasets makes their bias analysis a scarcely explored research area, as evident in the lack of existing works in the domain.

Such dominant focus on visual datasets is not unique to the study of bias but is, in fact, also observed in fields like visual analytics, where non-visual aspects of the system are transformed into visual aspects. For example, the neuron activations are presented graphically (visually) in the research on network interpretability (Becker et al., 2020) and security (Liu et al., 2018), which enables problem identification (detection). This in turn motivates deeper research/solutions.

2.3 Current and ongoing efforts

The vulnerability of NNs to robustness bias has only been recently discovered (Nanda et al., 2021). Hence, the efforts to resolve this particular category of bias are still limited. Nevertheless, a few AC approaches have been proposed within the last year to alleviate such bias. This includes a multi-objective training algorithm, which ensures that the standard error (which dictates the classification accuracy of the networks) and boundary error (since the inputs from class(es) closer to the decision boundary are expected to be more vulnerable under noise) (Xu et al., 2021) are minimal, thereby minimizing the bias. However, later work (Nayak et al., 2022) comes to a contrary conclusion, i.e., even the inputs with the same distance to the classification boundary may have different vulnerabilities to the noise. A re-weighting approach has also been proposed (Benz et al., 2021), which aims to update parameter values during training whenever the accuracy of a particular output class deviates too much from the average accuracy of the network.

Recent work (Benz et al., 2021) also notes that the bias in the NNs exists due to the dataset (and its features) itself, rather than depending on the NN model or its optimization factors. Yet, to the best of our knowledge, no DC effort has been proposed to alleviate bias from the dataset itself. It is interesting to note that adversarial training, a popular approach found successful in ensuring the robustness of NN against noise (concept explained later in Sect. 3.2), is found to aggravate the bias (Tian et al., 2021).

3 Preliminaries

This section describes the notions and provides the relevant formalism for balanced datasets, robustness, robustness bias, bias estimation and noise tolerance (Nanda et al., 2021; Naseer et al., 2020), which form the basis of UnbiasedNets. The terminology and notations introduced in the section will be used throughout the rest of the paper.

3.1 Balanced datasets

Contrary to the popular notion, i.e., a balanced dataset (Bagui & Li, 2021; Lemaître et al., 2017) consists of an equal number of inputs from all output classes, we define balanced dataset to be the dataset where all output classes are equally-represented.

Definition 1

(Balanced Dataset) Given a dataset X with \({\mathcal {L}}\) output classes (i.e., \(Y_1,Y_2,...,Y_{\mathcal {L}}\)), the dataset is said to be balanced/the output classes are equally represented iff density \(\rho\) of inputs from each class in the input hyperspace is (approximately) equal, i.e., \(\rho ({Y_1}) \approx \rho ({Y_2}) \approx ... \approx \rho ({Y_{\mathcal {L}}})\). Note that density \(\rho\) of input here refers to the average number of input samples contained within the unit hypervolume of the valid input domain for an output class.

This implies that a network trained on such a balanced dataset would potentially be equally likely to identify inputs from all the classes, without a bias (explained in Sect. 3.3).

3.2 Robustness

Robustness is the property of NN that signifies how the application of noise \(\Delta x\) to the inputs does not change what the trained NN originally learned about the inputs.

Definition 2

(Robustness) Given a trained network \(N: X \rightarrow Y\), N is said to be robust against the noise \(\Delta x\) if the application of an arbitrary noise \(\eta \le \Delta x\) to the input \(x \in X\) does not change network’s classification of x, i.e., \(\forall \eta \le \Delta x: N(x + \eta ) = N(x)\).

It must be noted that x corresponds to inputs that the network N does not originally misclassify, i.e., N(x) corresponds to the true output class for input x. For the purpose of this work, we assume the noise \(\eta\) to be bounded within the L\(^\infty\) space around input x, with the radius of \(\Delta x\)—this is one of the most popular noise used in NN analysis literature. Nevertheless, it is fairly straightforward to opt for any other type of (L\(^p\)-norm bounded) noise for the framework.

3.3 Robustness bias

Section 1 highlighted the well-studied NN biases in literature, i.e., data and representation bias. This paper instead deals with robustness bias (henceforth referred to as only bias) proposed by Nanda et al. (2021) and Joshi et al. (2022), which is a property of the dataset where a specific output class may or may not be robust under the application of noise. More specifically, it can be defined as follows:

Definition 3

(Robustness Bias) Given a dataset X with \({\mathcal {L}}\) output classes (i.e., \(Y_1,Y_2,...,Y_{\mathcal {L}}\)), and \({\mathcal {D}}_{Y_1},{\mathcal {D}}_{Y_2},...,{\mathcal {D}}_{Y_{\mathcal {L}}}\) as the of input sub-domain representing each output class. X is said to exhibit robustness bias iff the sub-domains \({\mathcal {D}}_{Y_1},{\mathcal {D}}_{Y_2},...,{\mathcal {D}}_{Y_{\mathcal {L}}}\) are not equidistant from the decision boundary.

Naturally, the sub-domains \({\mathcal {D}}_{Y_1},{\mathcal {D}}_{Y_2},...,{\mathcal {D}}_{Y_{\mathcal {L}}}\) may be disjoint or overlapping. However, as long as the sub-domains are equidistant from the decision boundary, the dataset is said to be free from a robustness bias. A NN trained on such a dataset is said to be unbiased, since intuitively, for a NN with a decision boundary equidistant from all input sub-domains, all output classes must be equally robust to noise.

However, given the large number of input features (forming an input hyperspace) in practical datasets, it is not easy to visualize the bias in the dataset itself. Hence, we define the notion of biased NN, which aids in identifying the robustness bias in the dataset via analyzing the NN trained on the dataset:

Definition 4

(Biased Network) Given a trained network \(N: X \rightarrow Y\), N is said to be biased if the application of an arbitrary noise \(\eta \le \Delta x\) to any (correctly classified) input from class \(X_i \subset X\) does not change network’s output classification, \(\forall \eta \le \Delta x, x_i \in X_i: N(x_i + \eta ) = N(x_i)\). However, application of the same noise to any input from another class \(X_j \subset X\) makes the network misclassify the originally correctly classified input from the class \(\forall \eta \le \Delta x, x_j \in X_j: N(x_j + \eta ) \ne N(x_j)\).

It must be noted that even though unbiasedness (i.e., the property of a trained NN to be unbiased) and classification accuracy may intuitively seem similar, they are not identical. Obtaining an accurate NN involves identifying the decision boundary that separates the output classes in the dataset. In contrast, obtaining an unbiased NN involves identifying a decision boundary that is equidistant from all the sub-domains encapsulating the different output classes. The resulting unbiased network, in turn, may or may not have the highest classification accuracy. However, all the output classes will likely be equally robust to noise in an unbiased network.

3.4 Metric for robustness bias

In practice, it is often impossible to obtain a completely unbiased NN. Hence, a metric is required to quantify and analyze the bias in the network. Let \(R_i\) be the ratio of misclassified to correctly classified inputs from class i, which defines the average tendency of inputs from output class i to be misclassified. We define the metric to estimate robustness bias (\({\mathcal {B}}_R\)) as follows:

$$\begin{aligned} {\mathcal {B}}_R = max\bigg (abs\bigg (R_i - \frac{\sum _{j\in {\mathcal {L}}\setminus i}R_j}{\mid {\mathcal {L}}\mid - 1}\bigg )\bigg ) \end{aligned}$$

where \({\mathcal {L}}\) is the set of all output classes. Having a \({\mathcal {B}}_R\) of zero indicates an equal \(R_i\) across all output classes, and therefore an unbiased NN. Consequently, larger \({\mathcal {B}}_R\) implies higher bias. It must also be noted that the (absolute) difference in ratios \(R_i\) and \(R_j\) is generally different across the different pairs of output classes. In order not to reduce (nullify) the impact of the differences (and hence that of the bias in the network), the maximum difference, rather than the average, is used to estimate the bias in NN.

Contrary to the formal notion of robustness bias, as provided in Definition 3, \(B_R\) uses the inputs to quantify bias rather than the decision boundary of the NN. This is a viable approach since the exact decision boundary of the NN is often hard to visualize for the multi-dimensional input space. The metric \(B_R\), instead, makes use of the measurable/quantifiable entity, i.e., the input classification, to estimate the bias. As stated earlier, the ratio \(R_i\) provides the tendency of the boundary to misclassify the inputs from class i. This is compared to the average tendency of misclassification of inputs from the other network classes \(R_j\)—this is analogous to comparing the distance of inputs to the decision boundary for different classes. Hence, if the ratio \(R_i\) for all classes is equal (analogously all classes are equidistant from the decision boundary), \(B_R\) computes to zero. The NN is then ought to be unbiased.

3.5 Noise tolerance

Similar to robustness, noise tolerance also checks the classification performance of a NN for inputs under the application of noise. However, it is a stronger property than robustness (i.e., noise tolerance to a specific noise implies robustness to the noise as well) such that it provides the bounds within which the addition of noise does not change the classification of the inputs by a trained NN.

Definition 5

(Noise Tolerance) Given a trained network \(N: X \rightarrow Y\), noise tolerance is defined as the maximum noise \({\Delta x}_{max}\), which can be applied to a correctly classified input \(x \in X\) such that N does not misclassify the input. Hence, for any arbitrary noise \(\eta \le \Delta x_{max}\), the application of noise to an input \(x \in X\) does not change network’s classification of x, i.e., \(\forall \eta \le \Delta x_{max}: N(x + \eta ) = N(x)\).

Alternatively, noise tolerance can be viewed as the largest \(\delta\)-ball (\(l^\infty\) norm ball) around the inputs, such that \(\delta = {\Delta x}_{max}\) and any input within this ball is correctly classified by the NN. Consequently, this knowledge can in turn be used to estimate the region around seed inputs where the realistic synthetic inputs may reside and still be correctly identified by a trained NN.

4 UnbiasedNets: framework for bias alleviation

We categorize UnbiasedNets into two major tasks: bias detection using a trained NN to identify the existence of robustness bias followed by bias alleviation to diversify the training dataset to eliminate the bias at its core. Figure 2 provides an overview of our proposed methodology.

Fig. 2
figure 2

Overview of the UnbiasedNets framework incorporating the proposed methodology starting with a trained NN undergoing bias detection, followed by bias alleviation, ultimately leading to a diversified dataset and potentially unbiased trained NN

4.1 Bias detection

The first step here is the application of noise \(\eta\), bounded by the small noise bounds \(\Delta x\) to the inputs present in the testing dataset \(x \in X\) (shown as Block 0 in Fig. 2) to obtain the noisy inputs \(x_n\).

$$\begin{aligned} x_n = x + \eta ~~~~s.t.~~~~ \eta \le \Delta x \end{aligned}$$
(1)

The noisy inputs are then supplied to the trained NN, and their output classifications are compared to the classifications of inputs in the absence of noise. For the network to be robust (see Definition 2), the NN’s classification must not change under the influence of noise. The noise is then iteratively increased, beyond the maximum noise at which the NN does not misclassify the inputs, i.e., beyond the NN’s noise tolerance (see Definition 5). Such iterative increment of noise provides the noise tolerance bounds of the network.

The application of noise larger than the noise tolerance bounds of the NN entails that the NN misclassifies some or all the noisy inputs. These misclassifying noise patterns (i.e., the counterexamples) act as inputs for the counterexample analysis. These noise patterns can be collected either using a formal framework [such as the ones based on model checking used by Naseer et al. (2020) and Bhatti et al. (2022)] or an empirical approach [like the Fast Gradient Sign Method (FGSM) attack (Goodfellow et al., 2015)].

During counterexample analysis, the collected noise patterns, and in turn the misclassified inputs, are used to compute the \({\mathcal {B}}_R\) of the network to detect the presence and severity of robustness bias in the trained NN. A non-zero \({\mathcal {B}}_R\) implies a robustness bias in the network. Additionally, the number of misclassified inputs from each class is also used to determine the number of synthetic inputs required in the training dataset (elaborated in Sect. 4.2.4) to alleviate the bias.

4.2 Bias alleviation

Using the noise tolerance available from the bias detection and the feature extremum of the inputs from the training dataset, we provide the step-by-step bias alleviation methodology. The aim of the methodology is to identify the valid input domain for the generation of synthetic data and provide a diversified training dataset for the training of a potentially unbiased NN. The details of each step in the methodology are as follows.

4.2.1 Bounds determination

For each input feature in every output class, the feature extremum, i.e., the maximum and minimum value of the feature as per the available training data, is first identified (as shown in Block 1 of Fig. 2). As discussed earlier, the inputs with noise, less than the allowed noise tolerance, are still likely to be correctly classified by a trained NN. Hence, the feature bounds are relaxed using \(\Delta x_{max}\), to provide a larger input space for the diversified inputs (also shown in Fig. 3a), as follows:

Theorem 1

(Bound Relaxation using Noise Tolerance) For input domain X, let \([\underline{x_i}, \overline{x_i}]\) represent the bounds of inputs belonging to \(X_i\) (where \(X_i \subset X\)) and \({\Delta x}_{max}\) be the noise tolerance of the network. From Definition 4, we know that the application of noise within the tolerance of the network does not change the output classification. Hence, more realistic input bounds \([\underline{x'_i}, \overline{x'_i}]\) can be obtained using the laws of interval arithmetic as:

$$\begin{aligned} \underline{x'_i} = min\big (\big (\underline{x_i}-\Delta x_{max}\big ),\big (\underline{x_i}+\Delta x_{max}\big ),\big (\overline{x_i}-\Delta x_{max}\big ),\big (\overline{x_i}+\Delta x_{max})), \\ \overline{x'_i} = max\big (\big (\underline{x_i}-\Delta x_{max}\big ),\big (\underline{x_i}+\Delta x_{max}\big ),\big (\overline{x_i}-\Delta x_{max}\big ),\big (\overline{x_i}+\Delta x_{max}\big )\big ) \end{aligned}$$

It must be noted that due to the scalability of underlying bias detection framework [for instance (Naseer et al., 2020)], where the application of large noise to NN inputs may lead to very large formal models, not suitable for analysis, noise tolerance may not always be available for bound relaxation. A similar challenge is encountered for NNs with a very low noise tolerance. Consider the example of a NN trained on an image dataset, where the addition of noise leading to a magnitude change of even 1.0 in the pixel value of an image may still lead to misclassification (Ma et al., 2021). This indicates a very low noise tolerance. Under these conditions, UnbiasedNets assumes the noise tolerance to be zero, and proceeds with feature extremum as the feature bounds obtained during bound determination.

Fig. 3
figure 3

a Realistic bounds determination for individual feature bounds using available training inputs, K-means clustering and noise tolerance, b Bound tightening to eliminate/reduce bound overlap for synthetic input generation

4.2.2 Bound tightening

Bounds obtained from the previous step identify the regions in the input space where real inputs from the training dataset exist, and hence provide an estimate for the generation of valid synthetic data. However, it is possible for the feature bounds for different output classes to overlap, as shown in Fig. 3b. The overlap can be either partial or complete. This provides a means for tightening the feature bounds (shown as Block 2 in Fig. 2), hence leading to smaller, yet realistic, input space for the generation of synthetic data. This in turn ensures that a lesser number of iterations are required for realistic synthetic input generation in the later steps of the framework. The generation of tighter feature bounds in the case of partial feature can be seen as follows:

Theorem 2

(Bound Tightening in case of Partial Overlap) Given the bounds of input feature a for inputs belonging to class i and j to be \(\big [\underline{x_i^a}, \overline{x_i^a}\big ]\) and \(\big [\underline{x_j^a}, \overline{x_j^a}\big ]\), respectively, the bounds can be tightened to \(\big [\underline{x_i^a}, \underline{x_j^a}\big ]\) and \(\big [\overline{x_i^a}, \overline{x_j^a}\big ]\) provided that \(\underline{x_i^a} < \underline{x_j^a}\) and \(\overline{x_i^a} < \overline{x_j^a}\) (i.e., the bounds overlap partially). Then, any input belonging to the new bounds also belongs to the original feature bounds as well.

$$\begin{aligned} \begin{aligned} \forall i,j. \big (\big (\big [\underline{x_i^a},\overline{x_i^a}\big ] \in X_i^a \wedge&\big [\underline{x_j^a},\overline{x_j^a}\big ] \in X_j^a \big ) \implies \big (\big [\underline{x_i^a},\underline{x_j^a}\big ] \in X_i^a \wedge \big [\overline{x_i^a},\overline{x_j^a}\big ] \in X_j^a \big )\big ) \\&s.t.~~ \underline{x_i^a}< \underline{x_j^a}< \overline{x_i^a} < \overline{x_j^a} \end{aligned} \end{aligned}$$

However, the same cannot be generalized for complete overlap since the bounds of one label form a subset of the other. As such, tightening is possible for a single label only.

Theorem 3

(Bound Tightening in case of Complete Overlap) Given the bounds of input feature a for inputs belonging to class i and j to be \(\big [\underline{x_i^a}, \overline{x_i^a}\big ]\) and \(\big [\underline{x_j^a}, \overline{x_j^a}\big ]\), respectively, the bounds for feature a of class i, \(X_i^a\), can be tightened to \(\big [\underline{x_i^a}, \underline{x_j^a}\big ]\) and \(\big [\overline{x_j^a}, \overline{x_i^a}\big ]\) provided that \(\underline{x_i^a} < \underline{x_j^a}\) and \(\overline{x_j^a} < \overline{x_i^a}\). Then, any input belonging to the new bounds for \(X_i^a\) also belongs to the original feature bounds as well.

$$\begin{aligned} \begin{aligned} \forall i,j. \big (\big (\big [\underline{x_i^a},\overline{x_i^a}\big ] \in X_i^a \wedge&\big [\underline{x_j^a},\overline{x_j^a}\big ] \in X_j^a ) \implies \big (\big [\underline{x_i^a},\underline{x_j^a}\big ] \in X_i^a \wedge \big [\overline{x_j^a},\overline{x_i^a}\big ] \in X_i^a \big )\big ) \\&s.t.~~ \underline{x_i^a}< \underline{x_j^a}< \overline{x_j^a} < \overline{x_i^a} \end{aligned} \end{aligned}$$

Motivating Example Consider an arbitrary feature a with valid input values in the range [0, 10]. Let the inputs from class i have the bounds [2, 8] and those from class j have the bounds [7, 10], for the feature a. Without bound tightening, any input \(7<x^a<8\) can belong to either class i or j (but not both). On the contrary, bound tightening reduces the bounds of the feature a for classes i and j to [0, 7] and [8, 10], respectively. This reduces the valid input domain for feature a such that it is impossible to pick a sample for feature a that may belong to more than a single output class, hence simplifying the task of generating realistic synthetic input samples.

4.2.3 Feature clustering

The previous steps in the framework make use of the entire training dataset to obtain realistic feature bounds. But intuitively, real-world inputs often contain outliers that may be part of the training dataset, which do not occur frequently in practical case scenarios. To subsume this characteristic into the synthetic inputs generated, further bound tightening is carried out (shown as Block 3 in Fig. 2) on the top-k input features, i.e., the k features with the smallest distance from cluster centroid to the farthest input.

4.2.4 Synthetic input generation

Using the feature bounds obtained from the previous step, the random input values are chosen within the available bounds (shown as Block 4 in Fig. 2). The number of inputs to be added to each output class \(\chi _i\) is determined on the basis of the ratio of percentage of misclassified inputs from class i (i.e., \(\mu _i\)) and the percentage of misclassified inputs from the class with minimum misclassifications (i.e., \(min(\mu _L)\)) using counterexamples recorded during the bias detection. Hence, the class with higher \(\mu _i\) gets the most synthetic inputs added to the dataset.

Algorithm 1 outlines the entire synthetic data generation process, starting from the training dataset and noise tolerance bounds. Function classSegment (Line: 3) splits the dataset into non-overlapping subsets of inputs belonging to each class, globalExt (Line: 5) provides feature bounds using feature extremum, nonOverlapping (Line: 8) performs bound tightening on basis of Theorems 2 and 3, minDist (Line: 10) identifies the top-k features based on k-means clustering, boundsFinal (Line: 12) performs further bound tightening based on the top features, and randInp (Line: 15) finally generates the synthetic inputs for each output class.

figure a
Fig. 4
figure 4

Redundancy minimization by \(50\%\) in a two-dimensional input space

It must be noted that the above input generation assumes an implicit hyperrectangular distribution of the input domain. This means, each input feature may take any input value (from within the defined input bounds), with equal likelihood. However, it is also possible for the input features to have non-rectangular distributions. Assuming these distributions to be known a priori, the random input generation could be modified to select input values, from within the input bounds, according to their probability of occurrence in their exact input distributions, i.e., with the more probable values having higher likelihood of selection and vice versa.

4.2.5 Redundancy minimization

Oversampling may lead an NN to overfit to the training samples. Moreover, the existence of similar inputs, after the addition of synthetic inputs, does not add to the diversity of the dataset. Existing works also indicate that training the NNs on smaller datasets—for instance, those obtained by eliminating input instances leveraging different distance matrices—may reduce the timing overhead for training while providing comparable classification accuracy (Fuangkhon, 2022; Kotsiantis et al., 2006; Wang et al., 2009). (Also see Appendix A for case studies indicating how redundancy minimization using K-means deletion reduces the bias of the actual NNs).

Hence, \(x\%\) closely resembling inputs from each class are removed to minimize the redundancy in the diversified training dataset (shown as Block 5 in Fig. 2). This is done by generating \(\frac{1}{x}\) clusters for each output class and then retaining a single input from each cluster. The result is a dataset with input samples covering diverse input space, without densely populating any specific region of the input space (as realized in Fig. 4).

4.2.6 Dataset validation

Up until the previous step, UnbiasedNets used real-world inputs to identify valid input space within which the inputs exist, used knowledge of the percentage of misclassified inputs from each output class to identify the number of synthetic inputs to generate, and minimized the redundancy in the generated input samples to obtain a diversified dataset. However, features in the real-world data may be correlated, and the synthetic input features, despite lying in the valid input domain, may not follow the correlation of real-world data. Hence, this step aims to validate the synthetic inputs by comparing their feature correlation with that of the original training data. If the percentage difference between the correlation coefficients is within \(t\%\), the new inputs are deemed suitable for training a potentially unbiased NN. Otherwise, the process of synthetic data generation is repeated until the feature correlation of the synthetic inputs resembles that of the original training dataset (shown as Block 6 in Fig. 2).

The choice of t is made on the basis of the percentage difference between the correlation coefficients of training and testing datasets. However, if this difference is too large, the features may simply be independent, or obtaining appropriate correlations may require some input pre-processing (Zhao et al., 2006). The use of only simple Pearson correlation coefficient, on such raw data, may not be an appropriate statistical measure to ensure the synthetic inputs to be realistic here. (Check Appendix B for more insights into this.)

5 Experiments

This section describes our experimental setup, and details of NNs and datasets used in our experiments.

5.1 Experimental setup

All experiments were carried out on CentOS-7 system running on a 3.1GHz 6 core Intel i5-8600. Our UnbiasedNets framework was implemented on MATLAB. The NN training was carried out using Keras.

However, the setup did not make use of any special libraries and, hence, can be easily re-implemented using any programming language(s). Bias detection (and counterexample generation) was carried out using SMV models with applied noise in the range of 1–40% of the actual input values, using a timeout of 5 minutes for each input.

5.2 Datasets and neural network architecture

We experimented on the Leukemia dataset (Golub et al., 1999), which is composed of the genetic attributes of Leukemia patients classified between Acute Lymphoblast Leukemia (ALL) and Acute Myeloid Leukemia (AML). The training dataset consists of 38 input samples (with 27 and 11 inputs indicating ALL and AML, respectively), while the testing dataset contains 34 inputs (with 20 and 14 ALL and AML inputs, respectively). We trained a single hidden layer (20 neurons), fully-connected ReLU-based NN, using the top-5 most essential genetic features from the dataset extracted using Minimum Redundancy and Maximum Relevance (mRMR) feature selection technique (Khan et al., 2018). A learning rate of 0.5 for 40 epochs followed by another 40 epochs with a learning rate of 0.2 were used during training.

We also experimented on the Iris dataset (Dua & Graff, 2017; Fisher, 1936), which is a multi-label dataset, with characteristics of three iris plant categories as input features. The dataset has an equal number of inputs from all output classes. We split the dataset into training and testing datasets, with 120 and 30 inputs, respectively, while ensuring an equal number of inputs from all classes in each dataset. A fully-connected ReLU-based two-hidden layer (15 neurons each) NN was trained with a learning rate of 0.001 for 80 epochs, using a training to validation split of 4:1.

Since UnbiasedNets is a data-centric bias alleviation framework, we compare the framework to well-acknowledged open-source state-of-the-art data-centric approaches: RUS, ROS, SMOTE (Chawla et al., 2002) and ADASYN (He et al., 2008). The Python toolbox imbalanced-learn implements all of the aforementioned techniques, except RUS, and was used for the generation of testing datasets. Since these approaches require the number of inputs to be different in each class, \(50\%\) of the inputs from the Iris dataset were randomly selected to create a sub-dataset with an unequal number of inputs for the classes. RUS was implemented on MATLAB, removing inputs from class with more inputs to ensure both classes have the same number of inputs in the case of the Leukemia dataset and removing \(25\%\) samples from each class in the case of the Iris dataset. To avoid overfitting during retraining of NNs using augmented datasets, the number of training epochs was reduced proportionally to the increase in the size of datasets.

All NNs considered in the experiments were trained to the training and testing accuracies of over \(90\%\). In addition, the experiments for each bias alleviation approach were repeated 10 times to ensure conformity.

6 Results and analysis

This section elaborates on the empirical results obtained from our experiments followed by comparison and analysis of UnbiasedNets to the data-centric bias alleviation approaches.

6.1 Observations

As the number of output classes increases, ensuring an unbiased NN becomes a more challenging task. This was clearly observed in our experiments (Table 2), wherein the multi-label classifiers had a higher bias and at the same time, their bias reduction was substantially less effective in all bias alleviation approaches.

Table 2 Comparison of \({\mathcal {B}}_R\) values (average ± standard deviation) obtained for the NNs trained on original and diversified datasets, using open-source state-of-the-art approaches and UnbiasedNets

As discussed in Sect. 3, lower \({\mathcal {B}}_R\) indicates that the difference in the ratio of misclassified to correctly classified inputs is low, implying that the NN is less biased towards any output class. As summarized in Table 2, our UnbiasedNets framework outperformed all the DC bias alleviation techniques while obtaining optimum \({\mathcal {B}}_R\) values for both binary and multi-label datasets. Moreover, in the case of the Iris dataset, using classical data-centric approaches to generate dataset with an equal number of inputs from each class seems to exacerbate the robustness bias. Although UnbiasedNets may not always reduce the robustness bias, the data diversification ensures that the dataset remains balanced.

This success of biased can also be seen in Fig. 5, which shows the variation in \({\mathcal {B}}_R\) values over the repeated experiments. It is clearly evident that the individual experiments leading to a decrease in average robustness bias are far more compared to vice versa. Hence, we advocate executing several instances of experiments in order to obtain dataset instances that offer the best bias alleviation.

Additionally, it can be seen from the box plots that NNs trained using the UnbiasedNets datasets demonstrate considerably low interquartile ranges and the lowest average \({\mathcal {B}}_R\) values. Even though RUS illustrates competitive \({\mathcal {B}}_R\) values, the use of RUS is not appropriate for small datasets, since the approach involves the deletion of real input samples and may hence diminish the learning capability of the NN. The remaining approaches, i.e., ROS, SMOTE, and ADASYN, present a large variation in \({\mathcal {B}}_R\) results, deeming the approaches less effective for alleviation of robustness bias.

Fig. 5
figure 5

Variation in \({\mathcal {B}}_R\) results for NNs trained on RUS, SMOTE, ADASYN, ROS, and the diversified UnbiasedNets datasets

6.2 Analysis

Our work focuses on robustness bias, which is exhibited by a trained NN in the presence of inputs having higher robustness to noise for certain output classes as compared to others. From our experiments, we confirm the hypothesis that having an equal number of inputs (as in the case of Iris dataset) is in fact insufficient to ensure an unbiased network.

In the case of the datasets where the number of inputs in each class is different, the known approaches like RUS, ROS, SMOTE, and ADASYN may reduce the bias. But for most datasets, they may be inadequate for robustness bias alleviation mainly for two reasons: (1) they rely on the naive definition of balanced datasets and only ensure the number of inputs for each class is equal, which overlooks the requirement of each class to be equally-represented (concept explained in Sect. 3.1) in the input, and (2) during data augmentation, new inputs are only added in between the existing inputs, which neither diversifies the dataset sufficiently nor ensures that the new inputs are valid candidates for the augmented dataset. UnbiasedNets, on other hand, uses counterexample analysis from the bias detection stage to obtain the required number of inputs in each class for a potentially equally-represented dataset. It also uses noise tolerance, which allows us to diversify the data beyond the bounds of the existing training dataset, which is subsequently validated by leveraging feature correlations, to alleviate bias in NN.

In the case of the Iris dataset, ROS and SMOTE were observed to significantly worsen robustness bias. This may be partially due to the deletion of inputs from the dataset to create an unequal number of inputs in the classes, which reduces the data available for NN training. However, RUS retained the \({\mathcal {B}}_R\) value close to the original dataset, even though the approach also employs input deletion. This suggests that the data augmentation by ROS and SMOTE may actually contribute to an exacerbation of bias rather than alleviation. In the case of UnbiasedNets, even though the improvement in bias is often small, the results clearly suggest that diversifying the training dataset by adding realistic synthetic inputs and reducing redundancy in dataset is a potential direction to alleviate bias in NNs, unlike the other approaches.

7 Discussion

UnbiasedNets aims to diversify the dataset so as to (potentially) achieve a balanced dataset. While the diversification goal for obtaining a completely unbiased network may not always be achieved, UnbiasedNets rarely aggravates the bias due to its precise perception of balanced datasets, unlike existing DC techniques. This section discusses the various aspects of NNs, which contribute to the challenge of data diversification and ultimately the persisting bias in trained networks.

7.1 Input resemblance

As seen from Table 2, the greater the number of output classes, the higher the robustness bias in the NN. This implies that the higher the number of output classes, the more likely is the dataset imbalanced, and the more unlikely it is to obtain a trained NN that is equally robust for all output classes. A likely explanation for this could in fact be a close resemblance of inputs from the different classes, for datasets with a higher number of output classes.

Fig. 6
figure 6

Inputs from one output class may resemble inputs from other classes, as observed in the MNIST dataset

For instance, consider the case of hand-written digits (from the MNIST dataset), which comprises of 10 output classes. As shown in Fig. 6, it is possible for inputs from some classes to closely resemble inputs from other classes—for example, digit 0 may resemble a 6, and digit 2 may resemble a 3. With inputs having likely resemblance to multiple classes, it is challenging to generate realistic synthetic inputs, and hence obtain successful data diversification for reducing the bias.

A more careful study of the example provided above also reveals that the difference between the closely resembling inputs blur when their semantic distance is smaller (Kenett, 2019), as shown in Fig. 6. Yet, the syntactic rules for output classification stay intact even for these closely resembling inputs. For instance, a single loop forms the digit 0, while an arc of a length comparable to half the circumference of the loop is required in addition to the loop to syntactically define the digit 6. Hence, the addition of such syntactic rules for the generation of synthetic inputs [similar to the approach taken in neuro-symbolic learning (Sarker et al., 2021)] may improve the data diversification.

7.2 Curse of dimensionality

Another challenge to data diversification is the large number of input neurons comprising the NN inputs—a challenge often referred to as the “curse of dimensionality" in the NN analysis literature (Wu et al., 2020). This implies that as the number of input neurons for the NN increase, the computational requirements for its analysis increase exponentially.

To understand this from the perspective of data diversification, let us consider the example of an image dataset. Data diversification determines input feature bounds directly from the raw input data to generate inputs such that the synthesized inputs x belong to the valid input \({\mathcal {D}}\), i.e., \(x\in {\mathcal {D}}\). However, various transformations, like affine, homographic and photometric transforms associated with image inputs may tremendously change the inputs, while still keeping the inputs realistic (Pei et al., 2017). Hence, for a practical image dataset, inputs belonging to even a single output class will have individual inputs that have undergone different transformations. As a result, the bounds of each input feature obtained from the inputs, for such a dataset, will be very large. This hinders the generation of synthetic data using these bounds, in turn making the data diversification halt at the data validation step since the search input space is too large for the randomly generated inputs to be realistic. (See Appendix B for details on the experimental analysis carried out to test the stated hypothesis on a real-world image dataset, MNIST.)

Towards this end, appropriate input pre-processing and the use of feature correlation knowledge to determine the bounds of the correlated input features (rather than raw input features) could potentially extend the applicability of UnbiasedNets framework to a larger variety of datasets.

8 Conclusion

The overall performance of Neural Networks (NNs), particularly those relying on supervised training algorithms, is largely dependent on the training data available. However, the data used to train NNs may often be biased towards specific output class(es), which may propagate as robustness bias in the trained NN. But, unlike checking the testing accuracy of the trained NN, determining the bias in a NN is not a straightforward task. Existing works often rely on large datasets and aim at addressing biases by ensuring an equal number of inputs from each output class. However, as shown by our detailed experiments, such approaches are not always successful. This paper proposes a novel bias alleviation framework UnbiasedNets, which initially detects and quantifies the extent of bias in a trained NN and then uses a methodological approach to diversify the training datasets by leveraging the NN’s noise tolerance and K-means clustering. To the best of our knowledge, this is the first framework specifically addressing the robustness bias problem. We show the efficacy of UnbiasedNets, using both binary and multi-label classifiers in our experiments, and also demonstrate how the existing bias alleviation may rather exacerbate the bias instead of alleviating it. We also discuss the challenges in robustness bias alleviation in certain datasets, and elaborate on the potential future research direction for addressing the robustness bias problem in trained NNs.