1 Introduction

As more actors in the industry and public sector adopt AI-based solutions to improve their services, the impact of these solutions on society and individuals is being scrutinized due to ethical and safety concerns. A systematic review of existing principles and guidelines for ethical AI [1] across 84 documents from 12 countries found that transparency was the most widely agreed upon ethical AI principle, mentioned in 73 out of 84 sources. While the principle of privacy ranked fifth in popularity, being mentioned in 47 out of 84 sources, it is by far the most legally protected, with many national laws protecting the privacy of personal data. This includes Article 8 of the European Convention on Human Rights, the General Data Protection Regulation (GDPR) in the EU, the California Consumer Privacy Act (CCPA) in the US, the Personal Data Protection Bill in India, the Act on the Protection of Personal Information in Japan, and many others.

Certain laws protect both privacy and explainability rights. For example, the European Union’s General Data Protection Regulation (GDPR) includes principles related to processing personal data that require data processing to be lawful, fair, and transparent. It also stipulates, in Articles 13–15, that those handling and processing data must provide clear, understandable information about the logic behind automated decision-making to non-experts. To fully comply with these provisions, it is necessary for any person or organization subject to them to both protect individuals’ privacy and explain how machines have arrived at decisions that impact individuals. While there is no clear agreement on what constitutes a meaningful explanation, legal scholars have noted that the demands for privacy may conflict with the demands for explainability [2].

Two areas of active research focused on addressing the challenges of preserving privacy and promoting explainability in artificial intelligence (AI) systems are privacy-preserving AI and explainable AI. Privacy-preserving AI refers to techniques and approaches that aim to protect the privacy of individuals while still allowing their data to be used to train or operate AI systems. Current solutions to privacy-preserving AI include distributed machine learning, encryption, and data perturbation techniques that decouple the learned model from training data [3,4,5]. These methods often allow multiple users to collaborate on training a machine learning model without exposing their private data, each receiving a trained model as a result of the collaboration. Explainable AI, on the other hand, refers to the development of AI systems that are able to provide clear and understandable explanations for their decisions and actions. In the context of algorithmic transparency, there is a demand for human-readable explanations of algorithmic decisions that can be appealed, and their fairness evaluated [6].

However, there has been limited research on approaches for developing artificial intelligence (AI) systems that are both privacy-preserving and explainable, especially in situations where these two goals may be in conflict. Most work on distributed machine learning revolve around the originally proposed method of Federated Averaging applied to Deep Neural Networks, which are considered opaque, black box models. Yoo et al. giving an overview of current issues with the real world application of federated learning in the medical field, observed that FL’s main advantage, protection of sensitive data, becomes its main disadvantage in medical applications because it becomes impossible to track the origins of errors made by the network trained in federation [7]. In a resent survey of federated learning of explainable machine learning models, Barcena et al. named three major challenges facing the field: preventing privacy leaks through explanations, dealing with massive data streaming scenarios, and merging local explainability models, thus ensuring that all clients get the same explainability rules [8].

In this work, we provide an overview of the existing methods for achieving explainability in the setting of distributed machine learning. We also address the issue of explainability consistency when using post-hoc explainability methods, such as Shapley Additive Explanations (SHAP), in a type of distributed machine learning called Data Collaboration. SHAP is a model-agnostic method that is commonly used because of its flexibility and ability to treat the collaboratively trained model as a black box. It also provides intuitive explanations and can identify bias in the training data [9]. However, when using SHAP or similar methods to explain the results of a black-box model, the explanations may be inconsistent due to differences in the input data used by different data-holders.

This presents several challenges for transparency of machine learning models trained in a distributed setting, which we will address with our proposed method of Explainable Data Collaboration: (i) if some users are members in more than one horizontally partitioned dataset, they should not get contradictory explanations for the same model from different data holders; (ii) if data bias present in one data holder has been corrected through collaboration, the obtained explanations should no longer display the bias; (iii) if users have their data vertically partitioned among several data holders, they should be able to obtain correctly proportioned additive feature attributions for the complete set of features. (e.g., Shapley values for all features added to the base value should equal the actual model prediction); (iv) data holders in vertical data collaboration should be able to inspect ’host’ feature attributions with ’guest’ features hidden; simultaneously, their view of feature attributions should be consistent with the user-side view for the complete set of features.

The main contributions of this work are summarized as follows:

  1. 1

    We identify the problem with explainability alignment that the conventional application of the SHAP method in distributed machine learning may produce and demonstrate it with a case study on open-access data.

  2. 2

    We propose a Horizontal DC-SHAP algorithm that uses a shareable anchor dataset as a baseline to produce consistent explanations among all collaborators and experimentally verify the consistency of our method on open data.

  3. 3

    We propose a Vertical DC-SHAP (i) algorithm that provides client-side feature attributions for the whole set of features and a Vertical DC-SHAP (ii) algorithm that provides a partial view of feature attributions visible to one of the data holders; we further demonstrate the consistency of algorithms (i) and (ii).

The problem of explainability in Federated Learning has been previously addressed with approaches ranging from the development of federated versions of interpretable machine learning models [8] to the application of various post-hoc explainability methods [10]. However, to the best of our knowledge, the issue of inconsistency in the explanations provided by explainability algorithms in distributed machine learning has not been addressed before.

The practical implications of the proposed Explainable Data Collaboration Framework consists in the improvement of transparency and consistency of explanations obtained from different participants, which can enhance trust in the product and enable ethical application in various industries. The proposed algorithms for different scenarios of explainability in Data Collaboration can be adapted to different use cases, facilitating the adoption of privacy-preserving distributed machine learning in real-world applications. Moreover, the research findings presented in this paper aim to contribute to the growing body of knowledge on privacy-preserving distributed machine learning and explainable AI, and to inspire further research in these fields.

This work might be of interest to researchers, data scientists, machine learning engineers, policymakers, and practitioners interested in developing or deploying privacy-preserving distributed machine learning models. While it presents a solution to a particular technical problem, namely, adoption of KernelSHAP method for Data Collaboration Framework, it may inspire similar solutions for other methods of explainability and privacy-preserving distributed machine learning. The key takeaways for readers are the importance of transparency and explainability in privacy-preserving distributed machine learning, the understanding of the difficulties in achieving consistency of explanations obtained from different participants, and a practical approach in addressing these difficulties.

In the following section, we will provide a brief overview of existing approaches to distributed machine learning and explainable machine learning. We will also describe in detail two algorithms that are central to this paper: Data Collaboration and SHAP Additive Explanations. In the Methodology section, we will present our proposed method for achieving explainability for Data Collaboration in two settings: horizontally and vertically partitioned data. In the Experiments section, we will test the consistency of our explainability algorithms using open access data, and we will draw conclusions in the final section.

2 Related Work

To present a comprehensive understanding of the existing literature on this topic, and to highlight the gaps in the current research, we read all papers we could find that either addressed the problem of explainability in a distributed machine learning setting, or propose solutions for combining distributed machine learning with explainable machine learning. We summarized our findings in section 2.3 of this paper. However, as this area of research is at the intersection of two vibrant fields, namely Distributed Privacy Preserving Machine Learning and Explainable Machine Learning, we also give a brief overview of main methods in those fields in sects. 2.1 and 2.2 to introduce the background of our research problem and to assist the readers in understanding of component methods.

2.1 Distributed Privacy-Preserving Machine Learning

Distributed and privacy-preserving machine learning has been studied in a variety of approaches, all aimed at creating a model without exposing training data to the analyst or other users of the model. Many of these approaches involve aggregating model parameters in some way. For example, in Private Aggregation of Teacher Ensembles (PATE) [11], privately trained models are used to train a global student model with differential privacy. Another popular approach is Federated Learning, which trains a global machine learning model through iterative rounds of averaging local models’ parameter updates [12]. In addition, Vertical Federated Learning addresses a specific problem setting where data is partitioned in feature space rather than sample space [13]. These and other approaches to privacy-preserving machine learning in a distributed setting are known as Federated Learning Systems [14].

The Data Collaboration (DC) method was proposed in 2020 [5]. Unlike Federated Learning, it is a non-model-sharing type of distributed machine learning in which participants have different model components in their pipeline [15, 16]. It uses irreversible transformations of dimensionality reduction that are executed locally by users to create intermediate representations of their data that can be shared. The central analyzer then combines these intermediate representations into a single dataset and trains a machine learning model, which is then distributed back to each user. The integration step differs depending on whether the data is horizontally or vertically partitioned. In a vertical partition setting, integration consists of simply concatenating the intermediate representations. In a horizontal partition setting, an additional integrating transformation is required, which is computed using auxiliary synthetic data shared among the users. This synthetic data, called anchor data in the original Data Collaboration paper, is produced by collaborators as randomly generated values within the distribution of the original data features. Figure 1 illustrates the horizontal and vertical Data Collaboration mechanisms for two users. The task of interpretability in Data Collaboration, which involves obtaining overall feature importance, has previously been addressed by building a surrogate model from DC model predictions on shareable anchor data [17]. The focus of this paper is on explainability, which involves attributing feature importance to individual samples.

Fig. 1
figure 1

Data Collaboration (DC) Horizontal DC (a) and Vertical DC (b), where \(X_i\), are private datasets, \(f_i\) are private transformation functions, \(X^{anc}\) is shared anchor data, \(L_i\) are labels for supervised learning tasks, \(g_i\) are integrating transformation functions, and h is a machine learning model

2.2 Explainable Machine Learning

The goal of explainability is to interpret the prediction given by a model on a given input by attributing relative importance to each feature of the input. There are two main approaches to explainable AI (XAI): transparent models, which are designed to be interpretable, and post-hoc explainability, which involves using external techniques to explain the behavior of models that are not inherently interpretable. It should be noted that the boundaries between these categories can be blurry [18].

Among the inherently transparent models are the following: Rule Based Systems (RBSs), Decision Trees (DTs), Linear/Logistic Regression, k-Nearest Neighbors, Generalized, Additive Models and Bayesian Models. The parameters of these models can be easily interpreted to understand how they produce their output given a specific input.

Among the post-hoc explainability methods, researchers further distinguish model-agnostic techniques, which can be applied to any model to provide explainability, and model-specific techniques, which are designed specifically to explain certain types of machine learning models. An example of a model specific post-hoc explainability is a method of integrated gradients [19] and other methods to visualize internal states of neural networks. The two commonly used methods of model-agnostic explainability are LIME [20] and SHAP [21]. The latter work proposed that several explainability methods can be unified under the formulation of additive feature attribution methods,

$$\begin{aligned} g(z') = \phi _0 + \sum ^M_{i=1}{\phi _iz'_i}. \end{aligned}$$
(1)

where \(z' \in {\{0,1\}}^M\) is a binary variable representing the presence or absence of simplified input features M and \(\phi _i\in {\mathbb {R}}\) is an attribution of feature importance.

In the same paper, the authors conjectured that only the methods based on game-theoretic Shapley values satisfy three desirable properties of model explanations: accuracy, missingness, and consistency. For the purposes of this paper, the consistency property is of special importance. It states that for any two models f and \(f'\), if

$$\begin{aligned} f'_x(z') - f'_x(z' / i) \ge f_x(z') - f_x(z' / i). \end{aligned}$$
(2)

for all inputs \(z' \in {\{0,1\}^M}\), where \(z'/i\) denotes instances of \(z'\) where \(z'_i = 0\), then \(\phi _i(f', x) \ge \phi _i(f,x)\). In other words, for a given sample x, if the change in the output of \(f'\) from introduction of a simplified feature \(z'_i\) is bigger or equal to the change in the output of f, the attribution value for the feature i in model \(f'\) should not be less than its attribution in the model f.

To extend the consistency principle to Data Collaboration method, we let f be the collaboration model h composed with individual transformation functions \(f_1\) and \(g_1\) of the first user, and \(f'\) be the collaboration model h composed with the functions \(f_2\) and \(g_2\) of the second user (Fig. 1a). Then, according to the inequality (2), if the collaboration model of one user relies more on a certain feature, this feature should get bigger attribution in the explanations for this user. This important property makes Shapley values-based methods suitable to provide explainability for the Data Collaboration method.

Algorithm 1 presents a procedure for a model-agnostic KernelSHAP method [21].

figure a

2.3 Explainable Federated Learning Systems

In this section, we provide a brief overview of existing solutions for combining distributed machine learning with explainable machine learning. There are many methods under the umbrella of federated learning systems, and some of these may be suitable only for specific scenarios and have limited properties. We also analyze the existing methods from the perspective of explainability consistency, which means we consider whether different clients in the federation could potentially receive different explanations of global model predictions for the same data instances. In vertical federated learning, the goal of explanation consistency is slightly different, as it involves correctly displaying the combined effect of the hidden features of other clients.

There is a growing body of work that develops federated protocols for inherently transparent machine learning models, such as decision trees and rule-based systems. In a resent work, Renda et al. developed a federated protocol for learning inherently explainable rule-based models (FED-XAI) for the automated vehicle networking use case [22]. A different approach for training an inherently explainable federated model involves the use of evolutionary rule learning (ERL) to train a federated fuzzy neural network [23]. This method has been shown to produce superior results when the local data at each party is non-independent and identically distributed (Non-IID). Inherently transparent models do not produce misalignment of client-side explanations in a horizontal setting, as all the clients share the same set of interpretable rules. For the case of vertical data partitioning, there is a solution proposed by Wu et al. [24], which involves training a decision tree by means of homomorphic encription and secure multiparty computation. In this method, there are no provisions for inspecting the effects of hidden features, thus each client only knows the split threshold for the features they own.

Model specific post-hoc explainability methods are primarily aimed at making high-performing neural networks explainable, and with minor modifications such methods can be applied to classical federated learning with FedSGD or FedAvg algorithms. For example, in papers [25, 26], the author uses a horizontal federated deep learning model to predict taxi trip duration and further applies the method of integrated gradients to explain model predictions. In this work, the integrated gradients get averaged and thus unified among the clients.

Model-agnostic post-hoc explainability methods are based on probing the global model with various inputs generated from the local data distribution. Therefore, such methods are prone to misalignment of client-side explanations, unless they are unified by some shared background data or reference points.

Chen et al. [27] have developed an explainable vertical federated learning (FL) framework that incorporates post-hoc interpretation into deep learning models using a federated counterfactual explanation method. Counterfactual explanation is a local explainability technique that aims to explain a prediction by determining the minimum change required to an instance in order for the model to classify it as a specific class. Since counterfactuals are generated from local data distribution, they will differ among the clients for the same data instances.

A team from the Kyoto University recently proposed a scheme of calculating SHAP values in a federated manner by the server, relying on homomorphic encryption [28]. This procedure is suitable for the use-cases of cross-silo horizontal federated learning, where neither test data nor the model can be accessed by the server.

Vertical Federated learning presents a special challenge for post-hoc explainability, addressed by Wang [10]. In this work, the authors adapted SHAP algorithm to the vertical federated learning, paying special attention to modeling the combined effect of hidden features for correct and unified representation of feature attributions among all clients. In our work, we extend this effort to adapt SHAP algorithm to Data Collaboration framework, both in vertical and horizontal partition settings.

The Table 1 summarizes the reviewed methods specifying the specific ML models that they can support, whether the method is applicable in a horizontal or vertical data partitioning setting, and whether there is a provision for the explainability alignment among the clients. While some methods provide consistency of explanations among the clients by design, to the best of our knowledge, no previous work has addressed the problem of consistency in a methodological way, and none of the existing solutions is applicable to Data Collaboration method of Distributed Privacy-Preserving Machine learning, where clients may have different model components, potentially distorting local model explanations. This knowledge gap creates an opportunity for further research, which this paper aims to address.

Table 1 Comparison of Explainable Federated Learning Systems

3 Methodology

3.1 Feature Attribution in Horizontal Data Collaboration

In horizontal collaboration (see Fig. 1a), participants share the same set of features, but non-iid distribution of data samples can prevent getting consistent explanations. It happens because the available background dataset determines the reference value in the calculations of SHAP values.

We solve the issue of different background datasets by producing the reference value from the anchor data which is already shared among the users. In particular, we take the median value of the anchor data, though other aggregate statistics are also possible.

In our method (see Algorithm 2), a user who wishes to get feature attributions for a model prediction runs SHAP algorithm on the inputs of the sample of interest x, reference value r calculated from the anchor data, and a model composed of transformation functions F and G individual to the user, and a shared machine learning model h. In this way, the only difference from the conventional KernelSHAP algorithm consists in the composition of the model under explanation. However, it is going to be the case that composed models of different users may give different attributions to the features of the same sample. We claim that since SHAP values satisfy the principle of consistency, the discrepancies in feature attributions among the users truthfully reflect the discrepancies in model components trained in collaboration.

figure b

3.2 Feature Attribution in Vertical Data Collaboration

In vertical collaborative (see Fig. 1b), users have overlapping data points but different sets of features. Such a situation often arises in a business setting when separate entities collect different information on the same set of individual clients.

The challenge for explainability arises because the feature sets held by the users are disjoint, and every evaluation of a sample by the model requires a contribution of the missing features by all users. For simplicity, we will assume the collaboration of two users, whom we call the host and the guest, following the convention in literature. Hence, if the host wishes to evaluate the model on a new data point x, they must obtain the intermediate representation of the missing features of x held by the guest.

We imagine two distinct use cases for the explainability of Data Collaboration in a vertical setting.

  1. 1

    Attribution is requested at a third party for the complete set of features.

  2. 2

    Attribution is requested by one of the users for the partial set of features.

The first use case is important, because individuals can often access their own data, and should be able to request explanation of a specific decision made by a collaborative machine learning model for them. Algorithm 3 and Fig. 2 describe an approach that can produce such explanation.

figure c
Fig. 2
figure 2

Computing complete SHAP values in Vertical DC (1) Third-party user gets reference values (can be chosen by the third party or supplied by collaborators); (2) third-party user constructs artificial input data as a power set of reference and sample values; (3) collaborators transform corresponding partial input data; (4) collaborators centralize input data and get model prediction; (5) third-party user computes SHAP values with Algorithm 1

In the second use case, when explanations are requested by the host party for the partial set of the features, it is reasonable to maintain secrecy of feature attributions belonging to the guest party. However, displaying the aggregated attributions of all guest features will help to maintain the correct proportions of the host feature impacts on the model. The same approach had been explored by Wang for Federated Learning explainability [10].

To achieve such output of feature attribution, in Algorithm 4 we construct a power set S of binary feature indicators following the KernelSHAP method (Algorithm 1 but with an additional indicator for the aggregated host feature. We then proceed with constructing the simulated model inputs, \(X'\) consisting of two parts: one for the host features \(X'_h\), another for the intermediate representations of the guest features \(X'_g\). The host features consequently undergo transformation with host’s dimensionality reduction function \(F_h\) before both parts of simulated inputs are concatenated, and model predictions are obtained. The proposed method is schematically presented in Fig. 3.

figure d
Fig. 3
figure 3

Computing partial SHAP values in Vertical DC (1) Guest party supplies the intermediate representations of the missing sample and reference values; (2) host party constructs partial input data as a power set of sample and reference feature values; (3) host party gets intermediate representation of the partial input data; (4) host party constructs an aggregated features vector with intermediate representations obtained from the guest; (5) collaborators unify intermediate representations and get model predictions; (6) Host party computes SHAP values with Algorithm 1)

4 Experiments

4.1 Experimental Setting

Experiments were conducted on Census Income Dataset (Adult) from UCI Machine Learning Repository [29]. It consists of 48842 records of American adults extracted from the 1994 Census database. The prediction task is to determine whether an individual makes over 50K a year.

The prepossessing steps consisted of encoding all categorical values with labels and dropping features fnlwgt and education as not informative, shuffling, and separating train and test data. For the model, we used a k-Nearest Neighbors classifier with a number of neighbors set to 7 and "kd-tree" as a solving algorithm. Other Data Collaboration parameters included using 9-dimensional intermediate representations (F) and the same number of dimensions for collaboration projection (G), and 2000 points of anchor data. These parameters were fine-tuned on Adult data to improve the accuracy of the DC model. Unless stated otherwise, the random seed was set to zero.

The experiments were performed on MacBook Pro with a processor of 2.3 GHz 8-Core Intel Core i9 with Python 3.7 development environment.

4.2 Demonstration of Contradictory Explanations in Distributed Machine Learning

The purpose of this experiment is to verify the claim that the application of SHAP explainability independently by each party in distributed machine learning may result in contradictory outputs.

To demonstrate this, we split the training data so that User 1 holds 90% of positive labels, and the second user holds the remaining 10%, setting aside randomly selected 100 samples for validation. As a result, User 1 has 7811 samples with 90% of positive labels, and User 2 has 24650 samples with only 3% of positive labels. This produced a biased dataset in possession of User 1 with a higher expected prediction value compared to User 2, which would be reflected in SHAP output for the two users. After the data was split, we trained a Data Collaboration model among the two simulated users and used it to predict the Income variable. Next, KernelSHAP method (Algorithm 1) was applied to the DC model and validation data taking User 1 training data as a baseline, then the proposed Horizontal DC-SHAP method (Algorithm 2) was applied, using the shared anchor data as a baseline. The same process was repeated for User 2.

Figure 4 shows two samples selected from the validation set that display contradictory explanations when KernelSHAP algorithm is applied. For instance, in Fig. 4 (top), the feature attribution by User 1 of “Marital Status” is positive, while by User 2, it is by an equal amount negative. In Fig. 4 (bottom) it can be observed that the ’Relationship’ feature has a positive attribution by User 1 and negative by User 2. By applying DC-SHAP algorithm, the contradictions were resolved. It should be noted that the presented cases were hand-picked, and the setting was engineered for the worst-case scenario for demonstration purposes. It is important to remember that in a real-life scenario of distributed machine learning, there will be no way to compare the explanations. Therefore, extreme cases like this should be carefully explored. In the next section, we will verify the consistency of the proposed algorithm in general cases.

Fig. 4
figure 4

Selected samples that demonstrate contradictory explanations In all cases, the same DC model prediction is being explained for the same data instance. Explanations are given for class 1 predictions

4.3 Consistency of Explanations in Horizontal Collaboration

For testing the consistency of the proposed feature attribution method in horizontal data collaboration, we split the training data among two users and trained a Data Collaboration model as described in Fig. 1. After that, we compared explanation coefficients between the two users for the same 50 samples of the test set and reported the difference in the attribution of each feature, measured by Root Mean Square Error (RMSE). This metric was preferred over the similar Mean Absolute Error (MAE) metric because RMSE tends to give higher weight to large errors and is recommended when large errors are particularly undesirable, as it is in our case. First, the simulated users obtain explanations for the samples of the test set using KernelSHAP method as described in Algorithm 1, computing the reference value from their share of the training data. Then, the explanations for the same samples are obtained with the proposed Horizontal DC-SHAP method as described in Fig. 2 and Algorithm 2. The divergence of the explanations between the two users obtained by both methods is reported in Fig. 5. The experiment was repeated 10 times, with random seeds set up from 0 to 9.

Fig. 5
figure 5

Differences in feature attributions among the two users in horizontal Data Collaboration, expressed as Root Mean Square Error for each feature

We then repeated the same experiment on other open access datasets with similar learning tasks: Iris, Wine, Heart Decease from UCI Machine Learning Repository [30], and Pima Indian Diabetes [31]. We did not change the hyperparameters of the DC model apart from the target dimension for dimensionality reduction, which was adjusted to 3/4 of the original dimension of each dataset. Table 2 reports the results for each dataset averaged across all features. The results indicate better consistency in feature attribution among the users when the proposed method is used. In particular, the average discrepancy in feature attributions in terms of RMSE decreased in various datasets by at least a factor of 1.75.

Table 2 Differences in feature attributions among the two users in horizontal Data Collaboration, expressed as Root Mean Square Error averaged across all features on various open datasets

In our experimental setting, the calculation of feature attributions for one sample of Adult dataset (12 features) averaged at 1.88 s. It should be noted, that the time complexity of KernelSHAP algorythm is \(O(F2^M)\), where F is the cost of function evaluation, and M is the number of features, so the computational cost will grow exponentially with the number of features. While, most function evaluations can be made in constant time, time complexity of k-Nearest Neighbors grows with the number of training samples as O(nlog(n)).

4.4 Explanations in Vertical Collaboration

In Vertical DC, there is no problem with diverging baselines because collaborating parties share the same set of training samples. The challenge here is that parties have disjoint sets of features and in order to evaluate the model, they have to follow a protocol of information exchange that does not leak private data and allows obtaining valid explanations of model predictions. Algorithm 3 and 4, achieve this by exchanging intermediate representations of vertically partitioned data inputs into explainability algorithm and by balancing partial feature attributions with a unified DC Features indicator that reflects a combined effect of hidden features on model output (see example output in Fig. 7).

To demonstrate the proposed feature attribution method in vertical Data Collaboration, we let host and guest parties share all rows of the Adult training data but with fully disjoint sets of features. In our experiments, the host was assigned features from 0 to 5, and the guest - from 6 to 11. With this partition, we trained a K-Nearest Neighbor classification model through Data Collaboration and obtained feature attributions for 100 test samples data.

The results in Fig. 6 show feature attributions for all features as if requested by a third party and partial feature attribution results obtained by the host and the guest party, with external features being aggregated into one DC Features indicator. It can be observed that partial results are consistent with the results for all features.

Fig. 6
figure 6

Feature attribution comparison in Vertical DC Each dot represents a feature of one data instance. Features are listed in the fixed order to improve readability

Fig. 7
figure 7

Example of a single sample DC-SHAP output The predicted value for this sample is 0.6 as class 1, and feature values in red have a positive impact on the model prediction, while the feature in blue have a negative impact

5 Discussion

The experiments presented in the previous section were set up to validate the proposed algorithms for calculating SHAP values in different Data Collaboration settings. In particular, we were expecting to observe the resemblance of feature attributions for the same inputs when done by different collaborators. However, we did not expect to see the exact equality of feature attributions across the partitions because client-side data transformations necessary for Data Collaboration might influence feature attribution by each client. Such difference in feature attributions is nevertheless correctly reflecting model behavior of the client, as follows from the consistency property of SHAP algorithm presented in Sect. 2.2. During the first experiment, we deliberately created a scenario where two clients who had collaborated to train a model could receive different explanations for the same set of data points. To achieve this, we introduced a large label imbalance between the clients, with one client receiving significantly more data points for a particular label compared to the other. This situation is not uncommon in various business applications such as healthcare, finance, and e-commerce.

For instance, in healthcare, different hospitals or clinics may have varying populations with different health conditions. In finance, clients may have varying levels of fraudulent transactions or risky behavior in their datasets. Similarly, in e-commerce, clients may have varying preferences or behaviors of customers.

As the first experiment has shown, in a case of extreme label imbalance post-hoc explainability methods such as SHAP, which use local data for probing a global model, can yield different explanations for the same data points. This can potentially create confusion for the owners of overlapping data samples and undermine trust in model correctness at early stages of adopting a particular federated learning system.

To address this issue, we proposed a solution that is tailored to Data Collaboration, but similar approaches can be adopted for other methods of horizontal and vertical distributed machine learning. Two principles can be followed to enhance the consistency of explanations among the clients:

  1. 1

    Using a set of shared data points as reference data for post-hoc explainability in the horizontal setting (anchor data, in our case)

  2. 2

    Adding a combined attribution of guest features for explainability in a vertical setting to display attributions of host features in the correct proportion.

By following these principles, we can ensure that the model generates consistent and accurate explanations for all data points, thereby building trust in the model and improving its adoption in real-world scenarios.

6 Conclusion

In this paper, we addressed a practical issue of obtaining meaningful explanations of predictions made by models trained in a distributed setting. First, we gave an overview of existing methods combining distributed machine learning techniques with explainability algorithms. We then identified and demonstrated the problems with explainability in distributed machine learning that can impede the transparency of AI-based decision-making systems. We further addressed those problems by proposing an Explainable Data Collaboration Framework based on model-agnostic additive feature attribution algorithm (KernelSHAP) and Data Collaboration method of privacy-preserving distributed machine learning. The proposed framework consists of three algorithms that tackle different use-case scenarios of horizontal and vertical distributed machine learning, while ensuring that end-users and participants of the collaboration obtain consistent and accurate feature attributions for overlapping samples. The performance of the proposed algorithms was experimentally verified on several open-access datasets. Our method achieved a significant (by at least a factor of 1.75) decrease in feature attribution discrepancies among the users in horizontally partitioned machine learning, and proved capable of obtaining explanations in vertically partitioned machine learning, such that a partial feature view was consistent with a full feature view.

6.1 Limitations and Future Work

The presented method relies on the KernelSHAP algorithm, which is a brute-force method of obtaining Shapley values for any model. It is not suitable for high-dimensional data because it relies on constructing a power set of all features, which quickly becomes intractable as the number of features increase. In the future work, we plan to adapt existing methods of approximating Shapley values as well as more optimized versions of SHAP for specific models, such as TreeExplainer and DeepExplainer.

In addition, this work only addressed the Data Collaboration method of distributed machine learning and some specifics of this method does not allow us to directly apply our work to other methods, such as Federated Learning. In particular, one of the steps in Data Collaboration is to share synthetic anchor data among the users, which we used for unifying the baseline for feature attributions in a horizontal Data Collaboration setting. In Federated Learning, an alternative method of sharing a reference value without compromising privacy is needed. Similarly, different methods of vertically distributed machine learning require individual protocols to satisfy explanation consistency and privacy requirements. Similarly, for the vertically partitioned data, the algorithm we proposed is specific for Data Collaboration, although the principle of representing hidden features through an aggregated indicator had been previously used for Federated Learning [10] and can be applied to other methods of vertically distributed machine learning.