1 Introduction

Interpreting complex nonlinear machine-learning models is an inherently difficult task. A common approach is the post-hoc analysis of black-box models for dataset-level interpretation (Murdoch et al., 2019) using model-agnostic techniques such as the permutation-based variable importance, and graphical displays such as partial-dependence plots that visualize main effects while integrating over the remaining dimensions (Molnar et al., 2020).

These tools are mostly limited to displaying the relationship between the response and one (or sometimes two) predictor(s), while attempting to control for the influence of the other predictors. This can be rather unsatisfactory when dealing with a large number of highly correlated predictors, which are often semantically grouped. While the literature on explainable machine learning has often focused on dealing with dependencies affecting individual features, for example by introducing conditional diagnostics (Strobl et al., 2008; Molnar et al., 2023), practical solutions for model interpretation in high-dimensional feature spaces with strong dependencies are a current area of research (Molnar et al., 2020, 2022; Seedorff & Brown, 2021; Au et al., 2022).

High-dimensional situations with strongly dependent features routinely occur in environmental remote sensing and other geographical and ecological analyses (Landgrebe, 2002; Zortea et al., 2007), which motivated the present proposal to enhance existing model interpretation tools by offering a new, transformed perspective. Similar issues occur in biomedical applications involving, for example, speech signal processing (Sakar et al., 2019) and Raman spectroscopy (Guo et al., 2020). With regards to remote sensing, for example, vegetation ‘greenness’ as a measure of photosynthetic activity is often used to classify landcover or land use from satellite imagery acquired at multiple time points throughout the growing season (Peña & Brenning, 2015). Spectral reflectances of equivalent spectral bands (the features) are usually strongly correlated within the same phenological stage since vegetation characteristics vary gradually. Similarly. when using texture features to characterize image structure based on a filter bank, features with similar filter settings can be strongly correlated, as in our case study (Brenning et al., 2012).

Feature engineering and feature selection offer a variety of approaches to address this problem by creating more interpretable features or reducing dimensionality and redundancy (Molnar et al., 2020; Verdonck et al., 2021). Nevertheless, neither of these approaches is a viable option in post-hoc analyses. Also, experience shows that feature selection may lead to a decline in predictive performance, and feature engineering often increases dimensionality rather than reducing it, as engineered features are usually added to a feature set rather than replacing the original features.

Considering these challenges, and the inherent need to reduce the complexity at the time of interpreting an already trained model, a novel strategy to visualize machine-learning models along cross-sections or paths through feature space is proposed in this paper. In many situations, principal components (PCs) offer a particularly appealing perspective onto feature space from a practitioner’s point of view (Seedorff & Brown, 2021; Brenning, 2023), although the proposed approach is not limited to this transformation as it also accommodates nonlinear embeddings. In addition, a modification is proposed that focuses on subgroups of features and their principal axes in order to allow for a more structured approach to model interpretation that more consistent with the data analyst’s domain knowledge. These different perspectives on re-expressing features can be referred to as data-driven or knowledge-driven, depending on the degree to which domain knowledge is leveraged to inform this step (Au et al., 2022).

Turning our attention back to the importance of individual features, an orthogonalization technique can be used to single out the effect of individual features on model predictions, avoiding the sometimes complex structure of PCs. A similar algorithm has been proposed in previous work (Adebayo & Kagal, 2016) in an isolated form that can be accommodated within the proposed general framework. This approach can, as proposed in this contribution, be applied to paths through feature space, such as nonlinear curves defined by domain-specific perspectives, or to data-driven transitions between clusters of observations.

Considering the outlined challenges and existing partial solutions, the objective of the present work is to establish a general formal framework for the post-hoc interpretation of black-box models in transformed space. The proposed framework can be combined with commonly used plot types and diagnostics including partial dependence plots, accumulated local effects (ALE) plots, permutation-based variable importance, and Shapley additive explanations (SHAP), among other model-agnostic techniques that only have access to the trained model (Apley & Zhu, 2020; Molnar, 2022). While the focus of this contribution is on visualizing main effects and their predictive importance, analyses of conditional relationships may also benefit from this perspective (Strobl et al., 2008; Molnar et al., 2023). The framework is implemented in an extensible, open-source package in R, the wiml package, which can be combined with existing model interpretation toolboxes.

2 Proposed method

Let’s consider a regression or classification model

$$\begin{aligned} {\hat{f}}:{\textbf{x}}\mapsto {\hat{f}}({\textbf{x}})\in {\mathbb {R}} \end{aligned}$$

that was fitted to a training sample \(L\) in the (original, untransformed) \(p\)-dimensional feature space \(X\subset {\mathbb {R}}^p\). I will assume \({\hat{f}}({\textbf{x}})\in {\mathbb {R}}\); in the case of classification problems, \({\hat{f}}({\textbf{x}})\) shall therefore represent predictions of some real-valued quantity such as the probability or logit of a selected target class. One of the features, referred to as \(x_s\), is selected as the feature of interest, and the remaining features are denoted by \({\textbf{x}}_C\).

2.1 Example: partial-dependence plots

In this situation, the partial-dependence plot of \({\hat{f}}\) with respect to \(x_S\) can formally be defined as

$$\begin{aligned} {\hat{f}}_{x_S,PDP}(x_S)&= E_{{\textbf{X}}_C}\!\left[ {\hat{f}}(x_S,{\textbf{X}}_C)\right] \\&= \int _{{\textbf{x}}_C}\!{\hat{f}}(x_S,{\textbf{x}}_C)\,d{}P({\textbf{x}}_C) \end{aligned}$$

(Molnar, 2022). This plot, which can be generalized to more than one \(x_s\) dimension, was introduced by Friedman (2001) to visualize main effects of predictors in machine-learning models.

The approach outlined in this section can be applied to ALE plots and related model-agnostic tools, including permutation-based variable importance and their conditional modifications, or Shapley additive explanations (see reviews by Molnar et al, 2020; Molnar, 2022).

Partial-dependence plots have some disadvantages such as the extrapolation of \({\hat{f}}\) beyond the region in \(X\) for which training data is available (Apley & Zhu, 2020; Molnar et al., 2022). This is especially the case when predictors are strongly correlated, as in our case study. Nevertheless, without loss of generality, this simple plot type helps to illustrate the proposed approach.

2.2 Transformed feature space

When several predictors are strongly correlated and/or express the same domain-specific concept such as ‘early-season vegetation vigour’ in vegetation remote-sensing, we may be more interested in exploring the overall effect of these predictors. Principal component analysis (PCA) and related data transformation techniques such as factor analysis are tools that are often used by practitioners to synthesize and interpret multivariate data (for example, Basille et al, 2008; Rousson & Gasser, 2004; Cunningham & Ghahramani, 2015).

More generally speaking, we could think of an invertible transformation function

$$\begin{aligned} {\textbf{T}}: X \rightarrow W\subset {\mathbb {R}}^p,\quad {\textbf{w}} = {\textbf{T}}({\textbf{x}}) \end{aligned}$$

that can be used to re-express the features in our data set. We will assume that \({\textbf{T}}\) is continuous and differentiable. PCA is one such example, which has been considered recently by Seedorff and Brown (2021) with a focus on a practical algorithm and its implementation.

Through the composition of the back transformation \({\textbf{T}}^{-1}\) and the model function \({\hat{f}}\), we can now formally define a model \({\hat{g}}\) on \(W\),

$$\begin{aligned} {\hat{g}} := {\hat{f}}\circ {\textbf{T}}^{-1}, \end{aligned}$$

which predicts the real-valued response based on ‘data’ in \(W\) although it was trained using a learning sample \(L\subset X\) in the untransformed space.

We can use this to formally re-express the partial-dependence plot as a function of \(w_s\):

$$\begin{aligned} {\hat{f}}_{w_S,PDP}(w_S)&= E_{w_C}\!\left[ ({\hat{f}}\circ {\textbf{T}}^{-1})(w_S,w_C)\right] \\&=\int _{w_C}\!({\hat{f}}\circ {\textbf{T}}^{-1})(w_S,w_C)\,d{}P{\textbf{w}}_C \end{aligned}$$

Note that \({\textbf{T}}^{-1}\), when used only on data in \({\textbf{T}}(X)\), does not create \({\textbf{x}}\) values outside the data-supported region \(X\), and it therefore avoids extrapolation of \({\hat{f}}\).

Also, when choosing PCA for \({\textbf{T}}\) as a data-driven approach, the \({\textbf{w}}\) variables in \({\textbf{T}}(L)\) are linearly independent, and statistically independent if \(L\) arises from a multivariate normal distribution. Thus, the PCA approach overcomes one of the limitations of partial-dependence plots and broadens their applicability, at least in the case of linear dependencies.

2.3 Partial orthogonalization

In some instances, PCs (and other multivariate transformations) of large and complex feature sets can be difficult to interpret, and analysts would therefore like to focus on individual features that are perhaps ‘representative’ of a larger group of features — for example, vegetation greenness in mid-June may be a good proxy for vegetation greenness a few weeks earlier and later, as expressed by other features in the feature set (Peña & Brenning, 2015).

This can be addressed by proposing a transformation of \(X\) in which \(w_s := x_s\) is retained, while making the remaining base vectors linearly independent of \(x_s\). This can be achieved through partial orthogonalization,

$$\begin{aligned} w_i := x_i - a_i - b_i x_s, \end{aligned}$$
(1)

where \(a_i\) and \(b_i\) are the intercept and regression slope of a simple linear regression of \(x_s\) on \(x_i\) as the response. For simplicity of notation, it is assumed that the data is centered and standardized beforehand, in which case it simplifies to \(a_i = 0\), and \(b_i\) equals the Pearson correlation coefficient.

This then defines a linear transformation \({\textbf{T}}:X\rightarrow W\), which can be represented by its coefficient matrix. Note that \({\textbf{T}}\) can be inverted using

$$\begin{aligned} x_i = w_i + b_i w_s, \end{aligned}$$
(2)

since \(x_s = w_s\), and assuming that all \(b_i < 1\), which is the case when there are no duplicated features. A related iterative orthogonalization approach has previously been proposed in the context of feature ranking (Adebayo & Kagal, 2016).

2.4 Partial orthogonalization for dependence plots along synthetic features

Domain scientists may more generally want to visualize the effect of a real-valued function of multiple features. As an example, knowing that several features are strongly correlated, how does the response vary with their average, or, more generally, a linear or nonlinear function of the features? This information is sometimes hidden in an ocean of individual main-effects plots or variable-importance measures.

In other situations, there may be simple process-based models that have the potential to provide deeper insights into black-box models based on domain knowledge. These models may be candidates for an enhancement of feature space, or they might express specific theories or hypotheses.

Any of these transformations can be thought of as a real-valued function of the other features in the data set, \(h({\textbf{x}})\), which is added to the feature set as a new feature \(x_{p+1}:=h({\textbf{x}})\) to augment the feature space by one dimension. While this feature is not actually used by \({\hat{f}}\), the partial orthogonalization technique offers an entry point to examine how \(h({\textbf{x}})\), through its (linear) contribution to \(x_1,\ldots ,x_p\), impacts the predictions produced by \({\hat{f}}\).

In different use cases there may be different ways of constructing synthetic features of interest to the domain scientist:

  • A group of strongly positively correlated features could be averaged to obtain an overall signal (examples: daily mean temperature when the actual features are hourly temperatures).

  • Contrasts between groups of features could be calculated (example: average of daytime temperature features minus average of nighttime temperature features as a measure of diurnal temperature amplitude).

  • A linear path can be drawn from one cluster centre to another, where cluster centres \({\textbf{c}}_1, \ldots , {\textbf{c}}_k\in X\) are obtained by unsupervised clustering in feature space (for example, \(k\)-means). The path between clusters \(1\) and \(2\) is simply defined as \(t{\textbf{c}}_1+(1-t){\textbf{c}}_2\), etc. Here, an instance’s distance to a cluster centre could serve as a synthetic feature.

  • A linear path between user-defined points in feature space; in remote sensing, for example, so-called endmembers representing spectral characteristics of ‘pure’ surface types such as asphalt or water (Somers et al., 2016).

Evidently, these synthetic features could also be added to the feature set in a feature engineering step and used for model training. Nevertheless, the proposed approach provides an opportunity for a post-hoc assessment and visualization of the influence of such features on the model’s output.

Technically, partial orthogonalization along synthetic features is achieved by (linearly) partialing the effect of \(x_{p+1}\) out of the features \(x_1,\ldots ,x_p\) (Eq. (1)) — either out of all features, or out of the subset of features effectively involved in the calculation of \(h({\textbf{x}})\). In applying interpretation tools such as ALE plots or permutation methods, these features then need to be reconstructed from new values of \(x_{p+1}\), which is achieved by inverting the partial orthogonalization (Eq. (2)).

2.5 Two-dimensional model-agnostic plots

The proposed approaches are not limited to one-dimensional model interpretation along one selected feature \(x_s\in {\mathbb {R}}\) — the methods equally apply to bivariate relationships (\({\textbf{x}}_s\in {\mathbb {R}}^2\)), which can be used to display pairwise interactions. Clearly, in a high-dimensional situation, the need to reduce dimensionality in post-hoc model interpretation is even more pressing when interpreting up to \(p(p-1)/2\) pairwise interactions, and the proposed approach offers a practical tool to address this in situations where dimension reduction is viable.

3 Implementation

The proposed methods have been implemented in the R package wiml (code available at https://github.com/alexanderbrenning/wiml). It implements transformation functions called ‘warpers’ based on PCA (of all features or a subset of features), structured PCA (for multiple groups of features), and partial feature orthogonalization, all of which are based on rotation matrices and therefore share a common core. Due to the modular and object-oriented structure, users can implement their own transformations without requiring changes to the package.

These warpers can be used to implement the composition \({\hat{f}}\circ {\textbf{T}}^{-1}\) by ‘warping’ a fitted machine-learning model. The resulting object behaves like a regular fitted machine-learning model in R, offering an adapted predict method. From a user’s perspective, the resulting model feels like it had been fitted to the transformed data \({\textbf{T}}^{-1}(L)\), except that it hasn’t. This ‘warped’ fitted model can, in principle, be used with any model-agnostic tool that doesn’t require retraining. An implementation of the composition \(f\circ {\textbf{T}}^{-1}\) involving the untrained model \(f\) is also available; this can be used for drop and relearn or permute-and-relearn techniques (Hooker et al., 2021).

The package has been tested and works well with the iml package for interpretable machine learning (Molnar et al., 2018), but it can also be combined with other frameworks since it only builds thin wrappers around standard R model functions. Initial tests with the DALEX framework for explainable machine-learning (Biecek, 2018) and its interactive environment modelStudio (Baniecki & Biecek, 2019) have been successful, as have been tests with the pdp package (Greenwell, 2017). The wiml package does not re-implement existing model interpretation routines.

4 Case study

The potential of the proposed methods is demonstrated in a case study from land cover classification, which is a common machine-learning task in environmental remote sensing (for example, Mountrakis et al, 2011; Peña & Brenning, 2015). One particularly challenging task is the detection of rock glaciers, which, unlike ‘true’ glaciers, do not present visible ice on their surface; they are rather the visible expression of creeping ice-rich mountain permafrost. In the present case study, we look at a subset of a well-documented data set consisting of a sample of 1000 labelled point locations (500 presence and 500 absence locations of flow structures on rock glaciers) in the Andes of central Chile (Brenning et al., 2012).

There are 46 features in total, which are divided into two unequal subsets: Six features are terrain attributes (local slope angle, potential incoming solar radiation, mean slope angle of the catchment area, and logarithm of catchment height and catchment area), which are proxies for processes related to rock glacier formation. The other 40 features are Gabor texture features (Clausi & Jernigan, 2000), which are designed to detect the furrow-and-ridge structure in high-resolution (\(1\ \text {m}\times 1\ \text {m}\)) satellite imagery, in this case panchromatic IKONOS imagery (see Brenning et al, 2012, for details). The 40 Gabor features correspond to different filter bandwidths (5, 10, 20, 30 and 50 m), anisotropy factors (1 or 2), and types of aggregation over different filter orientations (minimum, median, maximum, and range). Sample maps of features and IKONOS imagery are shown in Fig. 2 of Brenning et al. (2012).

Texture features with similar filter settings are often strongly correlated with each other. This is especially true for minimum and median aggregation with otherwise equal settings, and for maximum and range aggregation. Overall, the median of each feature’s strongest Pearson correlation is 0.92 (minimum: 0.80). Correlations among terrain attributes are much smaller (median strongest correlation: 0.60). Terrain attributes and texture features are weakly correlated (maximum correlation: 0.30). Correlation statistics are very similar for Spearman’s rank-based correlation.

Table 1 Overall accuracies of random-forest classification of rock glaciers using the entire feature set, and omitting either the terrain attributes or the texture features
Fig. 1
figure 1

Feature (sub)space diagrams. Top row: First PCs of the entire feature set. Bottom row: First PCs of the texture feature (sub-)set and the top-ranked terrain attributes

To explore the feature sets, PCAs is performed for the entire set of 46 feature and for the subset of 40 Gabor features (Fig.  1). In the entire feature set, 63.6% of the variance is concentrated in the first two PCs (first six PCs: 83.7%). In the more strongly correlated Gabor feature set, in contrast, the first two PCs make up 72.2% of the variance (first six PCs: 89.5%). The main PCs turned out to be interpretable by domain experts. PC #1 of the Gabor feature set (‘Gabor1’, in the figures) is basically an overall average of all texture features, meaning that it expresses the overall presence of striped patterns of any characteristics. Gabor PC #2 represents the contrast between minimum and median aggregated anisotropic Gabor features and the rest; large values are interpreted as incoherent patterns with no distinct, repeated stripe pattern. Gabor PC #3 expresses differences between large-wavelength range or maximum-aggregated features versus the short-wavelength features, which represents the heterogeneity in the width of stripes, and thus the size of linear surface structures. Large values correspond to distinct patterns of large amplitude.

To test the partial orthogonalization along synthetic features, the role of a widely used terrain attribute that is not included in the original feature set is examined. The topographic wetness index (TWI) is defined as the logarithm of the ratio of catchment area and the tangent of slope angle, which means that it is linear in one feature (logarithm of catchment area; correlation 0.94), and slightly curved in another one (slope angle; correlation −0.68). Partial orthogonalization is applied with respect to slope and log. catchment area while leaving the other features unchanged. For comparison with the synthetic-feature approach, a model with TWI as an additional feature is also trained.

A random-forest classifier is used for the classification of rock glaciers based on the features introduced above. Its overall accuracy, estimated by spatial cross-validation between the two sub-regions (Brenning, 2012), is \(80.8\)%. Omitting terrain attributes from the feature set has a greater impact on performance than omitting the texture features (Table 1).

5 Results

5.1 Standard approach

With 46 features that are grouped into two semantic feature types (terrain attributes, texture features), it can be challenging to interpret the patterns represented by marginal effects plots (Fig.  2). Although there appears to be some consistency in direction among many of the texture features, it is difficult to identify an overall pattern that can be summarized verbally, and it would be unreasonable to present such detailed visual information to a conference audience that is expecting a concise and coherent narrative.

Fig. 2
figure 2

Ordinary ALE plots for all 46 features

5.2 Interpretation in transformed feature space

The ALE plots along principal axes distill 71.6 percent of the feature variance into only three plots (Fig.  3). Nevertheless, considering the semantic differences and weak correlations between texture features and terrain attributes, it seems unnecessary to combine all features in a joint PCA, which results in PCs with an at least slightly mixed meaning in this purely data-driven transformation.

Fig. 3
figure 3

ALE plots along the first six principal axes, applying PCA to the entire feature set

The structured PCA approach, in contrast, allows us to explicitly separate the model’s representation of effects of texture features and terrain attributes, which is desirable from a domain expert’s perspective (knowledge-driven transformation) and statistically justifiable based on the weak correlations between these feature groups. Larger overall texture signals (Gabor PC #1) are associated with higher predicted rock glacier probability, although extremely large PC #1 values are less discriminative since they may also occur along linear features such as eroded channels (Fig.  4). However, a large contrast between minimum/median anisotropic texture features and the remaining texture features, as expressed by a high Gabor PC #2 value, is more often associated with an absence of rock glaciers. In other words, the absence of coherently oriented, repeated stripes is not typical of rock glaciers — these may be more typical of non-repeated stripes (for example, erosion gullies, jagged rock slopes).

Fig. 4
figure 4

ALE plots along the first principal axes of texture features, and for the most important terrain attributes

Fig. 5
figure 5

Permutation and SHAP feature importances of the 10 top-ranked texture principal components and terrain attributes. Bars indicate permutation variability and approximate confidence intervals, respectively, both at the 90% level

The permutation-based and SHAP-based assessments of the importance of texture PCs and terrain attributes both show that subsequent PCs contribute much less to the predictive performance, and that slope angle is the most salient feature overall (Fig.  5). Clearly, the combined importance of Gabor features as summarized by Gabor PCs #1 and #2 provides a more comprehensible summary than an incoherent litany of individual feature importances of strongly correlated features, which should not be permuted independently of each other (Fig.  6).

Fig. 6
figure 6

Two-dimensional ALE plot (left) and partial-dependence plot (right) with respect to the first and second principal axes of the texture features. Rectangular structures in the ALE plot are due to the ALE algorithm, not due to steps in random-forest predictions

Fig. 7
figure 7

ALE and partial-dependence plots for a synthetic feature, the topographic wetness index (TWI)

5.3 Visualization of a synthetic feature

The marginal effect of TWI as a synthetic feature is visualized using both ALE and partial-dependence plots (Fig.  7). TWI shows comparable relationships for the synthetic feature in post-hoc interpretation and for the added feature in the re-trained model (not shown).

From a more technical perspective, the use of ‘new’ values of the visualized variable \(x_s=h({\textbf{x}})\) results in an extrapolation of the catchment area and especially slope values reconstructed from \(x_s\) (Fig.  8, top row). This extrapolation is moderate and infrequent in the calculation of ALE curves and SHAP values (range increase by about 25%), while it is quite substantial and frequent for partial-dependence plots and the permutation method (range of slope increases by about \(\pm 20^\circ\) or 100%; log. catchment area by 50%). When jointly looking at two dimensions in feature space, it is, however, remarkable that the amount of extrapolation produced with the synthetic approach in (slope, log.carea) space is comparable to the extrapolation generated in the (slope, TWI) domain when examining the re-trained model with the added TWI feature (Fig.  8, top versus bottom row).

Fig. 8
figure 8

Distribution of feature values used to compute ALE and partial-dependence plots of TWI. Top row: TWI as synthetic feature from which slope angle and logarithm of catchment area were reconstructed. Bottom row: Results for a model re-trained with TWI as additional feature. Dashed lines highlight range of feature values in the sample. Color represents the logarithm of catchment area (top) and TWI (bottom), respectively

The functional relationship between TWI as a synthetic feature and slope angle and log. catchment area as its inputs is broken up only slightly as a consequence of the near-linearity. Specifically, a deviation >5% from the functional relationship affects only 1% of the function evaluations of \({\hat{f}}\) when constructing ALE plots (partial-dependence plots: 6%; SHAP values: 4%). Such deviations are more frequent when analyzing TWI as an additional feature in a re-trained model (ALE plot: 3.5%; partial-dependence plot: 35%, often by a large margin; SHAP values: 12%).

6 Discussion

Overall, interpretation plots along the principal axes are capable of distilling complex high-dimensional relationships into low-dimensional summaries in a data-driven manner, thus providing a tidier, better structured and more focused approach to model interpretation than traditional tools that focus on individual predictors in an ocean of highly correlated features. This behaviour is highly desirable from a domain expert’s perspective, and applying it in a structured, knowledge-driven manner allows the analyst to honour domain knowledge and feature semantics.

One concern in model-agnostic model interpretation is, or should be, the extrapolation from a training sample to a set of data points at which the model \({\hat{f}}\) is evaluated. This is especially an issue for PD plots and permutation methods, which require stronger extrapolation than the locally operating ALE method, as well as for non-smooth models such as random forests. In conjunction with the proposed method, this extrapolation takes place in the transformed space W, and as a consequence, it may be exacerbated or reduced, depending on the local properties of the transformation \({\textbf{T}}\). This is an issue that users should be aware especially when working with sparse data, such as multimodal data distributions.

Of course fitting the classifier to PCA-transformed data as input features could have provided direct access to ALE plots along principal axes. However, we would want our feature engineering decisions to be directed towards improving predictive performance, and we would therefore prefer not to risk compromising an optimal performance to satisfy our desire to interpret our model. While this is not an issue in the present case study (overall accuracy 0.789 with PC features versus 0.808 with the original predictors), our experience shows that PCA-transformed predictors can worsen the predictive performance. Also, model-agnostic post-hoc analysis tools are precisely meant to be applicable to black-box models that are provided ‘as is’, without the possibility of altering their input features, in which case the proposed ‘hands-off’ access to transformed perspectives is particularly valuable.

The proposed use of PCA and related linear transformation technique appears to be in contradiction to the use of complex nonlinear machine-learning models. Nevertheless, it could be argued that linear cross-sections of feature space along the original feature axes are no less arbitrary and limiting, considering the often strong correlations with other features that would have to be interpreted simultaneously. From that perspective, principal axes provide a ‘tidier’ perspective and smarter peek into feature space than traditional ALE or partial-dependence plots. Linear transformations similar to PCA may further enhance interpretability by offering a more structured or target-oriented perspective based on simple components (Rousson & Gasser, 2004) or discriminant functions (Cunningham & Ghahramani, 2015).

One could even argue that ALE plots can be misleading for highly correlated features as they look at the often tiny contributions of individual features in an isolated way, while the proposed approach focuses on the bigger picture and captures the combined effect of a bundle of features. This also becomes evident in feature importance assessments, where individual texture features consistently achieve discouragingly low importances, while the first two PCs of the texture features are ranked very highly. For the specific case of permutation assessments, it has also been proposed to jointly permute groups of features (Molnar, 2022); unlike the techniques proposed here, this approach is not transferable to other interpretation tools that are not based on permutations. Au et al. (2022) have therefore proposed approaches to assess the joint importance of feature groups and visualize joint marginal effects for linear combinations of features based on sparse supervised PCA for improved interpretability.

Beyond linear transformations, the proposed approach provides a general framework even for nonlinear perspectives on feature space and model functions. In particular, paths proposed in Sect. 2.4 may be nonlinear, as, for example, defined by a physical model that could be used by domain experts to check model plausibility in a knowledge-driven way. Nevertheless, especially nonlinear transformations should only be used in conjunction with interpretation tools such as ALE plots and SHAP values that aim to preserve correlations among features, and non-monotonic mappings should be avoided.

In a more data-driven approach in multivariate space, curvilinear component analyses (CCA) or autoencoders as state-of-the-art multivariate nonlinear transformation methods provide a logical extension of PCA and highlight the link between explainable machine-learning and projection-based visual analytics (Schaefer et al., 2013).

Finally, due to the orthogonality and thus linear independence of PCs, the more naturally interpretable partial-dependence plots become a more viable option for the interpretation of black-box machine-learning models. In original feature space, in contrast, the less intuitive and sometimes rather coarse ALE plots should usually be preferred despite their limitations (Fig.  6).

7 Conclusions

Despite the inherent limitations of post-hoc machine-learning model interpretation, feature-space transformations, and structured PCA transformations in particular, are a powerful tool that allows us to distill complex nonlinear relationships into an even smaller number of univariate plots than previously possible, representing perspectives that are informed by domain knowledge. These transformations provide an intuitive access to feature space, which can be easily wrapped around existing model implementations. Model interpretation through the lens of feature transformation and dimension reduction allows us to peek into the feature space at an oblique angle — a strategy that many of us have have successfully applied when checking if our kids are asleep, and a much more successful strategy than staring along the walls, that is, the original feature axes, especially when these are nearly parallel.