1 Introduction

Convolutional neural networks (CNNs) are increasingly being used in high-stakes fields such as medical diagnosis (Tjoa & Guan, 2020) and autonomous driving (Levinson et al., 2011). Yet, their decisions can be opaque, which makes it challenging for machine learning (ML) algorithm developers to diagnose and improve model performance. The black-box nature of neural networks has spawned the field of explainable Artificial Intelligence (XAI), which seeks to develop techniques to interpret and explain models to increase trust, verifiability, and accountability (Gunning & Aha, 2019). The term Interpretable Machine Learning (IML) is sometimes used to distinguish it from methods that offer an “explanation”, which has a rich history in the social sciences as involving human interaction and human subject studies to evaluate the quality of an explanation (Miller, 2019; Hoffman et al., 2018). We use both terms (XAI and IML) in this paper, since no clear, commonly agreed upon definition of explanation and interpretation exists (Linardatos et al., 2020).

The perceived need to enhance the interpretability of deep learning models has resulted in a number of techniques that are specifically targeted to fields that use ML, such as medicine (Bruckert et al., 2020), air traffic control (Xie et al., 2021), finance (Chen et al., 2018), and autonomous driving (Levinson et al., 2011). However, there is a paucity of IML techniques that have been explicitly adapted for imbalanced data.

Interpretation is critical to both imbalanced learning and IML; although both fields have approached it from different perspectives. IML has generally focused on model interpretability; whereas imbalanced learning has sought to better understand data complexity. In contrast, imbalanced learning has typically sought to understand the interplay of class imbalance with overlap, sub-concepts and data outliers because imbalanced data can exacerbate model latent feature entanglement, class overlap, and the impact of noisy instances on classifiers (Denil & Trappenberg, 2010; Prati et al., 2004; Jo & Japkowicz, 2004). In addition, many IML techniques usually seek to explain model decisions with respect to specific instances; whereas imbalanced learning is generally concerned with the global properties of entire classes.

In this work, we combine facets of both fields into a single framework to better understand a CNN’s predictions with respect to imbalanced data. We do not develop a single method to improve the interpretability of complex, imbalanced datasets. Rather, we propose a framework and suite of tools that can be used by both model developers and users to better understand imbalanced data and how a deep network acts on it.

In this paper, we make the following research contributions to the field of imbalanced learning:

  • Framework for understanding the high-dimensional imbalanced data. Many existing imbalanced learning techniques that assess data complexity are designed for binary classification on low-dimensional data and shallow ML models. Because we use the low-dimensional latent representations (Sect. 3.1) learned by a CNN, we are able to provide a suite of tools (Sect. 3) that efficiently visualize specific concepts that are central to imbalanced learning: class prototypes and sub-concepts (Sect. 3.2) and class overlap (Sect. 3.4).

  • Predict relative false positives by class during inference with training data. We show that the likely classes that will produce the most false positives during inference for a given reference class can be predicted from training data (Sect. 3.3).

  • Class saliency color visualizations. Existing IML methods display black and white heatmaps of pixel saliency for single dataset instances. We, instead, visualize the most salient colors used by CNN models to identify entire classes. Similar to IML saliency methods, we use the gradient of individual instances to map decisions to input pixels; however, we aggregate this information efficiently across all instances in large datasets by using color prototypes and latent feature embeddings (Sect. 3.5).

2 Background and related work

In this section, we introduce the guiding principles in IML and imbalanced learning that animate our framework:

  • Data is an important element of model understanding. Advances in deep learning have been built, in part, on access to large amounts of data. Therefore, it is critical to understand how the model organizes data into low dimensional representations used for classification.

  • Need for global data complexity insights to explain deep networks. Many current IML methods are instance-specific; whereas imbalanced learning explanation requires intuition about global (class) characteristics.

  • CNN texture bias as interpretation. The perceived texture bias of CNNs can be used to extract informative global, class-wise insights.

We also discuss the prior work that inspires our research and how our approach differs from previous methods.

2.1 Centrality of data to deep learning and class imbalance understanding

Deep learning has shown significant progress in the past decade due, in part, to the ubiquity of low cost and freely available data (Marcus, 2018). Deep networks are typically trained on thousands and even millions of examples to minimize the average error on training data (empirical risk minimization) (Zhang et al., 2018). As the size and complexity of modern datasets grow, it is increasingly important to provide model users and developers vital information and visualizations of representative examples that carry interpretative value (Bien & Tibshirani, 2011). In addition, when deep networks fail on imbalanced data, it is not always intuitive to diagnose the role of data complexity on classifier performance (Kabra et al., 2015).

In imbalanced learning, several studies have assessed the complexity of the data used to train machine learning models; however, many of these studies were developed for small scale datasets used in single layer models. Barella et al. (2021) provide measures to assess the complexity of imbalanced data. Their package is written for binary classification and is based on datasets with 3000 or fewer instances and less than 100 features. Batista et al. (2004) determined that complexity factors such as class overlap are compounded by data imbalance. Their study was performed with respect to binary classification on datasets with 20,000 or fewer examples and 60 or fewer features. Their conclusion that class overlap is a central problem when studying class imbalance was confirmed by Denil and Trappenberg (2010), Prati et al. (2004) and García et al. (2007). Rare instances, class sub-concepts and small disjuncts can also exacerbate data imbalance, add to data complexity and contribute to classifier inaccuracy (Jo & Japkowicz, 2004; Weiss, 2004; Aha, 1992). Ghosh et al. (2022) explore the use of geodesic and prototype-based ensemble to preserve interpretability on a synthetic dataset, a non-public dataset with 496 features and a public dataset with 13 features and less than 1000 instances, although their visualizations focus solely on decision boundaries.

Therefore, understanding data complexity, including class overlap, rare, border and outlier instances, is critical to improving imbalanced learning classifiers. This is especially important in deep learning, where opaque models trained with batch processing may obscure underlying data complexity (Ras et al., 2022; Burkart & Huber, 2021). Unlike prior work, which explained data complexity by examining model inputs, we explain data complexity via the latent features learned by a model. These low-dimensional representations are the raw material used by the final classification layer of CNNs to make their predictions.

2.2 Global (class) vs. instance level interpretation

Several studies have shown that interpretation is critical to machine learning model user satisfaction and acceptance (Teach & Shortliffe, 1981; Ye & Johnson, 1995). It is also important for model developers for diagnostic and algorithm improvement purposes. Explanation is central to both IML and imbalanced learning; however, these fields approach it in different ways.

In IML, great strides have been made to increase model interpretability by describing the inner workings of models and justifying how or why a model developed its prediction (post-hoc explanation) (Kenny et al., 2021). In general, IML techniques can roughly be divided into four groups.

First, there are methods that explain a model’s predictions by attributing decisions to inputs, including pixel attribution through back-propagation (Simonyan et al., 2013; Selvaraju et al., 2017; Sundararajan et al., 2017; Zhou et al., 2016). These methods generally work on single data instances and do not provide an overall view of class homogeneity, sub-concepts, or outliers (Huber et al., 2021). For example, CAM (Zhou et al., 2016), GRAD-CAM (Selvaraju et al., 2017) and pixel propagation (Simonyan et al., 2013) all highlight the most important pixels that a model uses to predict a single instance of a class. In contrast, our methods show the most relevant feature embeddings and colors for entire classes (i.e., all instances in a class).

Second, explanations by example provide evidence of the model’s prediction by citing or displaying similarly situated instances that produce a similar result or through counter-factuals—instances that are similar, yet produce an opposite or adversary result (Lipton, 2018; Keane & Kenny, 2019; Artelt & Hammer, 2019, 2020; Mothilal et al., 2020). Like pixel attribution methods, this approach only provides explanations for single instances or predictions.

Third, there are methods that explain a complex neural network by replacing, or modifying, it with a simpler model. These approaches include local interpretable model explanations (LIME) (Ribeiro et al., 2016), Shapley values (occlusion-based attribution) (Shapley, 1953), the incorporation of the K-nearest neighbor (KNN) (Fix & Hodges, 1989; Cover & Hart, 1967) algorithm into deep network layers (Papernot & McDaniel, 2018), and decision boundary visualizations. Both LIME and Shapley values can be computationally expensive because they involve repeated forward passes through a model (Achtibat et al., 2022). Methods that visualize decision boundaries, such as DeepView (Schulz et al., 2019), often rely on another model [e.g., UMAP (McInnes et al., 2018)] for dimensionality reduction and select a subset of a dataset to produce scatter plots. In contrast, our methods globally utilize a CNN’s internal representations for all instances in a training or test set to visualize classes that overlap (including the percentage of overlap), display class sub-concepts, and the most relevant colors that a CNN uses to distinguish an entire class.

Finally, there are IML methods that extract rules learned by a model (Zilke et al., 2016) and the features or concepts represented by individual filters or neurons (Gilpin et al., 2018).

In summary, many existing IML methods offer interpretations for single instances and do not describe the broad class characteristics learned, or used, by a neural network to arrive at its decision. By contrast, in imbalanced learning, the focus of most explanatory methods has been on the global properties of data and classes within a dataset, including the inter-play of class imbalance and data complexity factors, such as class overlap, sub-concepts and noisy examples.

In imbalanced learning, Napierala and Stefanowski partitioned minority classes into instances that were homogeneous (safe), residing on the decision boundary (border), rare, and outliers (Napierala & Stefanowski, 2016). We extend their method to both majority and minority classes and use a model’s latent representations to identify instance similarity based on the local neighborhood, instead of using the input space.

2.3 CNN texture bias as explanation

Recent work has demonstrated that CNNs emphasize texture over shape for object recognition tasks (Geirhos et al., 2018; Baker et al., 2018; Hermann et al., 2020). A precise definition of texture remains elusive (Haralick, 1979). Due to the difficulty of precisely defining texture, we focus on one of its properties—color or chromaticity of a region. We use a CNN’s color bias as explanation. As discussed in more detail in Sect. 3.5, we combine both saliency maps and pixel aggregation to reveal the most prevalent colors that a CNN relies on to distinguish a class.

3 IML framework for imbalanced learning

In this section, we outline our framework for applying IML to complex, imbalanced data. Our framework is built on feature embeddings (FE). It starts broadly by visualizing sub-concepts within classes, which we refer to as archetypes. Then, we use nearest adversary classes to gauge error during inference. Next, we visualize class overlap. Finally, our framework allows for zooming in on specific classes to view the most salient colors that define a class. The basic components of the framework are graphically shown in Fig. 1. The components of our framework can be flexibly applied sequentially and in their entirety; or individually, making it fully flexible to the user’s needs. Each component is discussed below.

Fig. 1
figure 1

Outline of the main components of our IML framework for imbalanced learning

3.1 Feature embeddings

To make our analysis of imbalanced data complexity more tractable, we work with the low dimensional feature embeddings learned by a CNN. We select the latent representations in the final convolutional layer of a CNN, after pooling. We refer to these features as feature embeddings (FE). FE can be extracted from a trained CNN and used to analyze dataset complexity and to better understand how the model acts on data. FE drawn from the final layer of CNNs capture the central variance in data (Bengio et al., 2013). In computer vision, it has similarly been hypothesized that high dimensional image data can be expressed in a more compact form, based on latent features (Brahma et al., 2015). We use these features, instead of prediction confidence because neural networks can lack calibration and display high confidence in false predictions (Guo et al., 2017).

3.2 Class archetypes

We divide each class into four sub-categories or archetypes: safe, border, rare, and outliers. The archetypes are inspired by Napierala and Stefanowski (2016). The four categories are determined based on their local neighborhood. We use K-nearest neighbors (KNN) to calculate instance similarity with FE instead of input features.

More broadly, the four archetypes facilitate model, dataset and class complexity understanding. We use \(K=5\) to determine the local neighborhood. Our selection of 5 neighbors is consist with imbalanced resampling methods such as SMOTE (Chawla et al., 2002) and its many variants, as well as Napierala and Stefanowski (2016). Using fewer than 5 neighbors would be challenging with 4 archetypes. More neighbors can be used; however, it may prove difficult with minority classes that have few instances (e.g., in our experiments, the minority class in CIFAR-10 only has 50 instances). The “safe” category represents class instances whose nearest neighbors are from the same class (\(N_c=4\) or \(N_c=5\)), where \(N_c\) is the number of same class neighbors. Therefore, they are likely homogeneous. The border category are instances that have both same and adversary class nearest neighbors (same class neighbors where \(N_c=2\) or \(N_c=3\)) and likely reside at the class decision boundary. The rare category represents class sub-concepts (same class neighbors where \(N_c=1\)). Finally, the outlier category are instances that do not have any same class neighbors (\(N_c=0\)). In the case of the majority class, outliers may indeed represent noisy instances, whereas for the minority class, the model may classify more instances as outliers due to a reduced number of training examples and the model’s inability to disentangle their latent representations from adversary classes. The four archetypes can be used to select prototypes that can be visualized and further inspected (see Sect. 4.2).

3.3 Nearest adversaries to visualize false positives by class

figure a

Algorithm 1: Nearest Adversaries

We believe that the local neighborhood of training instances determined in latent space contains important information about class similarity and overlap. During training, if a CNN embeds two classes in close proximity in latent space, then the model will likely have difficulty disentangling its representations of the two classes during inference (Dablain et al., 2023, 2023). This failure to properly separate the classes during training will likely lead to false positives at validation and test time. Based on this insight, we extract feature embeddings (FE) and their labels from a trained model and use the KNN algorithm to find the K-nearest neighbors of each training instance. If an instance produces a false positive during training, we collect and aggregate the number of nearest adversary class neighbors for each reference class. See Algorithm 1.

In Sect. 4.3, we show visualizations of this technique and how it correlates with validation set false positives using the Kullback Leibler Divergence. In addition, we compare our method to another method that has been used in imbalanced learning to measure class overlap - Fisher’s Discriminant Ratio (FDR):

$$\begin{aligned} FDR=\frac{(\mu _{{FE}_i} - \mu _{{FE}_k})^2}{\sigma _{{FE}_i}^2 + \sigma _{{FE}_k}^2} \end{aligned}$$
(1)

In Eq. (1), i and k represent pair-wise classes in a dataset and FE is a vector of feature embeddings, where the mean squared difference of latent features (FE) is divided by their variance. As used in Barella et al. (2021), FDR is a measure of how close two classes are, with lower values indicating greater similarity (feature overlap). Thus, like our nearest adversary technique, it can be used to determine class overlap.

3.4 Identify specific class feature map overlap

In the previous section, we examined class overlap at the instance-level. Here, we focus on overlapping class latent features. Each feature embedding (FE) represents the scalar value of a convolutional feature map (FM), after pooling. These FE/FM are naturally indexed and can be extracted in vector form. This natural indexing allows us to identify the FE’s with the highest magnitudes across an entire class.

figure b

Algorithm 2: Identify Specific Class Feature Map Overlap

For each class, the FE magnitudes can be aggregated and averaged. Then, the FE with the largest magnitudes can be selected (the top-K FE). If two classes place a high magnitude on a FE/FM with the same index position, then this FE is important for both classes and hence, may indicate feature overlap. See Algorithm 2. In Sect. 4.4, we provide visualizations of this method, along with suggestions for how it can be used by model users and developers.

3.5 Colors that define classes

Existing IML methods that trace CNN decisions to pixel space via gradient techniques track salient pixel locations for single image instances. They display a virtual black and white source image (black to indicate high pixel saliency to a CNN’s prediction and white to indicate low saliency). We make use of a gradient saliency technique commonly used in IML, which was developed by Simonyan et al. (2013). However, we modify it to trace a prediction to a pixel location only so that we can extract the RGB pixel values at that location.

figure c

Algorithm 3: Colors that Define Classes

We collect the top-K RGB pixel values for each instance in a training set and also the instance labels. For purposes of our illustrations in Sect. 4.5, we select the top 10% most salient pixels. We partition the collected pixels into bins based on the color spectrum (e.g., black, orange, red, green, blue, light blue, white, etc.). See Algorithm 3.

4 Experiments and results

4.1 Experimental set-up

To illustrate the application of our framework, we select five image datasets: CIFAR-10, CIFAR-100 (Krizhevsky, 2009), Places-10, Places-100 (Zhou et al., 2017) and INaturalist (Van Horn et al., 2018). For each dataset, we use different types and levels of imbalance to highlight varied applications of our framework (see Table 1 for dataset details). For purposes of our experiments, we consider three types of imbalance: exponential (exp.), step and natural. Exponential imbalance is introduced on a gradual basis in a multi-class setting, step imbalance has a cliff effect on the number of instances between classes and natural imbalance depends on data collection (which is unique to the INaturalist dataset).

To make training tractable, we limit Places to 10 and 100 classes and INaturalist to its 13 super-categories. CIFAR-10 and CIFAR-100 are trained with a Resnet-32 (He et al., 2016) backbone and Places and INaturalist with a Resnet-56. Although a Resnet architecture is used for our experiments, any CNN architecture that imposes dimensionality reduction should work (e.g., a (Huang et al., 2017) likely would not facilitate the use of lower dimensional feature embeddings). We adopt a training regime employed by Cao et al. (2019). Except where noted, all models are trained on cross-entropy loss with a single NVIDIA 3070 GPU. As discussed in the following sections, in several cases, we train models with a cost-sensitive method (LDAM) (Cao et al., 2019) to show how visualizations of both baseline and cost-sensitive algorithms can be used for comparative purposes to assess specific areas of improvement.

Table 1 Datasets and training

4.2 Class archetypes

Figure 2 shows the percentage of true positives (TP) for each class in 5 training datasets. The TPs are grouped based on class archetypes: safe, border, rare and outliers. For all of the datasets, the safe and border groups contain the greatest percentage of TPs relative to the total number of instances in the group.

Fig. 2
figure 2

This figure shows the percentage of True Positives (TPs) of the safe, border, rare and outlier archetypes in each training set. For Places-100, the classes with the 5 largest number of examples and the 5 fewest are shown to make the visualization interpretable. In the sub-figure legends, the class with the greatest number of instances is at the top (majority) and the class with the fewest instances is at the bottom (minority)

In Fig. 3, we select a prototypical instance from the safe, border, rare and outlier categories for the majority class (airplanes) and the minority class (trucks) from CIFAR-10 for visualization purposes. In large image datasets, it may not be obvious which examples are representative of the overall class (safe examples), which examples reside on the decision boundary (border), and which instances may be sub-concepts or outliers. For each of these categories and classes, we select the most central prototype, using the K-medoid algorithm, and visualize them.

Fig. 3
figure 3

This figure displays the safe, border, rare and outlier prototypes for 2 classes in the CIFAR-10 dataset. ad are from the class with the largest number of examples (airplanes) and eh are from the class with the fewest number of examples (trucks)

These visualizations can help identify potential issues that require further investigation of specific classes. For example, in the case of airplanes, it may not be immediately apparent to a human, who preferences shape over texture, why the outlier example is different from the safe prototype in Fig. 3. This seeming incongruence can serve as a flag for model users and algorithm developers. As discussed in more detail in Sect. 4.5, we conjecture that this seeming incongruity is due to a CNN’s preference of texture over shape when distinguishing classes (i.e., there is no blue sky in the outlier airplane prototype).

Use cases For model users, the four categories facilitate the visualization of representative sub-groups within specific classes. When dealing with large datasets, these visualizations can help reduce the need for culling through copious amounts of examples and instead allow model users to focus on a few representative examples: those that are relatively homogeneous in model latent space (safe), those that reside on the decision boundary (border), rare and outlier instances. See Fig. 3.

For imbalanced model developers, the class archetypes can help improve the training process. First, majority class outliers could possibly be mislabeled instances that should be removed. In this case, it may be necessary to examine all of the outlier examples, instead of only the prototype. Second, it can inform potential resampling strategies. For example, safe examples, due to their homogeneity, may be ripe for under-sampling; border and rare instances may be good candidates for over-sampling.

4.3 Nearest adversaries to visualize false positives by class

Figure 4 visualizes the relationship between validation error and training nearest adversaries by class for a CNN trained with the INaturalist dataset. In the figure, each class is represented with a single bar. The class names are matched with specific colors in the legend. In the figure on the left (a), each color in each bar stands for an adversary class that the model falsely predicts as the reference class. The length of the color bars represent the percentage of total false positives produced by the adversary class. In Fig. 4a, we show the validation set false positives.

Fig. 4
figure 4

This figure visualizes the relationship between validation error by class and training nearest adversaries by class. This tool can provide a powerful indication of the classes that a model will struggle with during inference

In contrast, Fig. 4b shows the percentage of adversary class nearest neighbors for each reference class. By placing these diagrams side by side, we can easily compare how nearest adversaries (on the training set) neatly reproduces the classes that will trigger false positives (in the validation set).

Table 2 KLD of Validation Set False Positives

This tool can provide a powerful indication of the classes that a model will struggle with during inference when only using training data. Model users and developers can employ the figure on the right as a proxy for the diagram on the left. For example, the training nearest adversary neighbors diagram (b) quickly shows, and the validation set diagram (a) confirms, that the model has the most difficulty (FPs) with: Fungi for the Protozoa class, Mammals for the Aves (birds) class, Amphibia (fish) for the reptile class, and reptiles for the Amphibia class.

In order to confirm the ability of training set nearest adversaries to predict the classes that a model will produce more false positives during validation, we measure the difference in the nearest adversary and validation FP distributions. We use Kullback Liebler Divergence (KLD) (Kullback & Leibler, 1951) to measure the difference in these distributions for five datasets. We also compare our nearest adversary prediction with two other methods: a random distribution and Fisher’s Discriminant Ratio. Table 2 shows that our method (NNB) predicts much better than random (by a factor between 1.8 and 34 times better) and compares favorably with another method that measures class overlap, Fisher’s Discriminant Ratio (see Sect. 3.3 for a description of FDR). Although FDR may be more accurate in some cases, it only shows pairwise similarity of classes. In contrast, our NNB method visualizes the proportionate similarity of all adversary classes compared to a reference class so that a tiered spectrum of overlap for classes in a dataset can be readily seen, offering a more realistic outlook on the difficulty of the considered dataset.

This simple tool is useful because it is an indicator of latent feature entanglement. If a model produces a large amount of adversary instances that are in close proximity in the training set to the reference class, then the model will likely have difficulty distinguishing the two close neighbors at validation time.

Use cases This technique can be a powerful tool for imbalanced data. Our method allows model users and imbalanced algorithm developers to gauge the classes that the model will have difficulty with. Therefore, our visualization allows users to reasonably predict the distribution of validation error based solely on the training set.

4.4 Feature map overlap

In the previous section, we visualized class overlap at the instance level. Here, we examine class overlap at the feature embedding (FE) level. FE are scalar values of the output of a CNN’s final convolutional layer, after pooling. Higher valued FE indicate CNN feature maps, in the last convolutional layer, that the model views as more important for object classification purposes.

Fig. 5
figure 5

This diagram provides a clear indication of class overlap at the feature map level. It shows the top-K (\(K=10\)) latent features (FE) used by the model to predict CIFAR-10 classes. Each of the 10 segments of each bar is color coded, such that gray is the FE with the largest mean magnitude (on the bottom of the bar) and pink is the smallest (top of the bar). In this case, there are a total of 64 feature maps, which correspond to the FE index numbers listed in the bar charts. Each number in the bar chart represents a FE or feature map index

In Fig. 5, we visualize the ten most significant FE for each class in CIFAR-10 (i.e., the ones with the largest mean magnitudes for each class). Each bar represents a class, as shown on the x-axis. Each of the 10 segments of each bar is color coded, such that gray is the FE with the largest mean magnitude (on the bottom of the bar) and pink is the smallest (top of the bar). Each color coded segment of a bar contains a number, which is the index of a FE/FM. For this model, there are 64 FE/FM. The relative size of each segment (y-axis) shows the percentage that each FE magnitude makes up of the top-10 FE magnitudes.

Therefore, the chart shows the most important latent features (feature maps) that a CNN uses for each class to make its class decision. Because the FE indices are shown for each class, they can be compared between classes to identify latent feature overlap.

For example, in Fig. 5a, we can see that trucks and cars contain five common FE in their top-10 most important FE (i.e., FE indices 57, 53, 43, 0 and 44). In contrast, trucks and planes share only 2 top-10 FE (FE indices 43 and 53). Trucks are the class with the fewest number of training examples, with planes the most, and cars the next largest. For trucks, the two classes that produce the most false positives at validation time are cars and planes, respectively (see Fig. 6a). This chart implies that the large number of FPs produced for planes and cars may be due to two different factors. In the case of planes, it seems to be due to numerical differences in the number of training examples because of the low FE overlap; whereas, in the case of cars, it appears to be due to FE overlap.

Fig. 6
figure 6

This diagram shows the false positives for trucks for CNNs trained with cross-entropy loss (CE) and LDAM

We can further explore this hypothesis by examining how a cost-sensitive algorithm, LDAM, which focuses on the numerical difference of training instances (and not features) behaves in the face of class overlap. In Fig. 5, the figure on the left (a) is trained with cross-entropy loss and the figure on the right (b) is trained with a popular cost-sensitive method used in imbalanced learning, LDAM. Interestingly, in the figure on the right (b), where a CNN is trained with a cost-sensitive method, there is still five FE that are shared in common between the truck and car classes. In fact, if we view figures (a) and (b) of Fig. 6, we can see that LDAM reduces false positives for the plane class but does not have a large impact on the automobile class, which is likely because it is geared toward addressing instance numerical differences and not latent feature overlap. Thus, although the cost-sensitive method may have addressed the class imbalance, in part, it does not appear to have completely addressed feature overlap.

Use cases This visualization can provide vital clues about where a CNN classifier may break-down. The cause of FPs may not always be solely due to class imbalance. Other factors, such as a model’s entanglement of latent features, may be at stake. In these situations, imbalanced learning algorithm developers may want to consider techniques that address feature entanglement, instead of solely class numerical imbalance. For example, it may be possible to design cost-sensitive loss functions that assign a greater cost to FE overlap based on FE index commonality between classes.

This visualization may also be used to assess cost-sensitive algorithms. The visualization can help imbalanced learning algorithm developers decide if, for example, cost-sensitive techniques are addressing only class imbalance or, additionally, if their methods improve feature entanglement in latent space [see also (Ghosh et al., 2022; Pazzani et al., 1994)].

4.5 Colors that define classes

This visualization can be used to identify the color bands that are most prevalent in a data class. As an illustration, Fig. 7 shows the top 10% of color group textures for the truck, auto and plane classes in CIFAR-10. In the case of autos and trucks, black (30%) and gray (15%) are the 2 most common colors. Since all cars and trucks have (black) tires, the presence of this color is not surprising. Even though the number of samples is vastly different between cars and trucks (60:1 imbalance), the overall proportion of color bands is very similar, which tracks the FE space overlap that we previously observed for these 2 classes (model feature entanglement). In the case of planes, black and gray are still important (17% each); however, there is a much larger percentage of blue, light blue, and white (12.5% each), due to the greater presence of blue sky and white clouds (background). In contrast, white is salient only 5% of the time for cars and trucks.

Fig. 7
figure 7

This diagram shows the top 10% of color groups for specific classes based on gradient saliency tracing. The classes are drawn from CIFAR-10

Fig. 8
figure 8

This diagram shows the most salient colors for a majority (baseball field) and minority class (raft), along with archetypical images drawn from the safe and border categories and a CNN trained on the Places-100 dataset. In the case of a baseball field, the model preferences green, brown and gray; whereas for rafts, white is more prevalent (likely due to white rapids) and brown and green are less emphasized. This type of information may be relevant for purposes of oversampling techniques in pixel space. By determining the colors that the model preferences, it may be possible to modify the colors via augmentation to train the model to preference other colors (Color figure online)

Additionally, Fig. 8 shows safe and border prototypes for a baseball field (majority class) and rafts (minority class) from a CNN trained on the Places-100 dataset with cross-entropy loss. For the baseball field, the most salient colors used by the model to detect class instances are black, green and brown. In the safe prototype, we can see black leggings on the player’s uniform, green grass and a brown infield. In the border prototype, we can see a black background (over the fence), player black shoes, and green grass. In the case of rafts, green and brown are not as prevalent in the model’s top 10% most salient pixels. Instead, white (white water rapids) and the black background are more important.

Use cases Users of CNNs trained on imbalanced data may use this visualization to better understand the major color bands that are prevalent across a class. When combined with class prototype visualization, it can also provide intuition into whether a classifier is using background colors (e.g., blue sky or clouds) to discern a class. For imbalanced learning algorithm developers, it can suggest specific pixel color groups that may be over- or under-sampled at the front-end of image processing to improve classifier accuracy.

5 Limitations and future directions

There are several potential limitations to our research that should be seen as future directions in developing XAI and IML systems for imbalanced data. First, our techniques were applied to datasets comprising object recognition in natural scenes. A future research direction could be to extend these techniques to object detection and in-door settings. Second, we focused on datasets where the number of class instances were imbalanced. A potential future research direction could be to extend our research to adversarial example analysis. For example, when an adversarial instance is misclassified, (1) which feature embeddings caused the misclassification, (2) what is the distribution of classes that are falsely predicted by adversarial examples, and (3) which input image colors or features does the model struggle when small perturbations are made to an image class?

6 Conclusion

We present a framework that can be used by both model users and algorithm developers to better understand and improve CNNs that are trained with imbalanced data. Because modern neural networks depend on large quantities of data to achieve high accuracy, understanding how these models use complex data and are affected by class imbalance is critical. Our Class Archetypes allow model users to quickly identify a few prototypical instances in large datasets for visual inspection to determine safe, border, rare and outlier instances of a class in datasets with multiple classes and a large number of examples. Our Nearest Adversaries visualization enables model users and developers to identify specific classes that overlap in a multi-class setting and provides a “heatmap” of the classes causing the greatest overlap. Our feature overlap visualization allows model users to identify specific latent features that are overlap and cause model confusion. Finally, our colors that define classes technique permits model users to understand specific colors that a model relies in making decisions for an entire class which provides insight into potentially spurious feature selection and the role of background scene context on model decisions.