XAI-based cross-ensemble feature ranking methodology for machine learning models

Jiang, Pei; Suzuki, Hiroyuki; Obi, Takashi

doi:10.1007/s41870-023-01270-2

XAI-based cross-ensemble feature ranking methodology for machine learning models

Original Research
Open access
Published: 29 April 2023

Volume 15, pages 1759–1768, (2023)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Information Technology Aims and scope Submit manuscript

XAI-based cross-ensemble feature ranking methodology for machine learning models

Download PDF

2359 Accesses
5 Citations
Explore all metrics

Abstract

Artificial Intelligence (AI) as one robust technology has been used in various fields, making innovative society possible and changing our lifestyles. However, the black box problem is still one big problem for artificial intelligence. In this study, we first compared the results of kernel Shapley Additive exPlanations (SHAP) for various machine learning models and found that the single SHAP model cannot explain the models at the human knowledge level. Then the factors’ global ranking was calculated using our proposed ensemble methodology. Finally, the new factors’ ranking was compared with other factor ranking method. Our experimental results declare that the proposed cross-ensemble feature ranking methodology provides stable and comparatively reliable feature ranking in both the classification and regression models.

Ensemble Feature Selection for Rankings of Features

Ensemble Features Selection Algorithm by Considering Features Ranking Priority

Rank Aggregation Approach to Feature Selection for Improved Model Performance

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Artificial Intelligence (AI) as one robust technology has been used in various fields [10, 23, 24, 27], making innovative society possible and changing our lifestyles. Although AI technology is booming in the current society. There are still many problems unloved, and the black box problem is the most pressing problem of it [2, 8, 16, 30, 33]. To understand the AI models, Explainable AI (XAI) [5, 6, 19, 29] has become one crucial research topic. Generally, XAI models have two types: intrinsic (rule-based) or post hoc models [22]. The intrinsic models obtain knowledge by restricting the rules of machine learning models, e.g., linear regression, logistic analysis, and Grad-CAM [28]. In contrast, the post hoc models refer to the application of interpretation after training, such as Local Interpretable Model-agnostic Explanations (LIME) [26, 37] and SHAP [18, 20]. The rule-based intrinsic and post hoc models do not have standard metrics. Therefore, they just can be compared with the ranking of the factors and cannot be combined. However, the kernel SHAP method makes the ensemble factors ensemble possible.

SHAP method has been used in various fields [1, 3, 4, 9, 12, 17, 31, 34, 35] and is certificated robust [21, 32, 36]. The SHAP methods make us can explain the black box model and know the local and global reasons for one prediction or classification. SHAP method also contains two kinds: model agnostic (Kernel SHAP), and model specific (Tree SHAP, deep SHAP) [18, 20]. These model-specific SHAPs are designed to explain the specific models to decrease the calculation or loss of the complex models and can only be used for particular models. In contrast, the kernel SHAP can be used for any model. However, does the kernel SHAP explain models correctly and reliably? This needs to be clarified.

To overcome these challenges, we propose an ensemble XAI model to achieve a more reliable factor importance ranking methodology. Our proposed ensemble XAI model combined the models’ performance and their SHAP values. We first used kernel SHAP to explain various machine-learning models for six datasets. Then we combined the models’ accuracy and SHAP values using our proposed cross-ensemble methodology to receive a more reliable factor ranking. Finally, we compared our results with other models. Our main contributions to this study are summarized as follows:

We proposed an ensemble XAI methodology to calculate the factor importance ranking, which can be used for both classification and regression models.
Our proposed methodology can rank factor importance comparatively stable and reliable.
Our analysis identified new essential risk factors for diabetes based on non-objective-oriented census data in Japan.
Our study paved the way for trustable AI research.

The rest of this paper is organized as follows. First, the necessary knowledge about SHAP and our proposed methodology are introduced in Sect. 2. Then, Sect. 3 describes six used datasets. Next, we show the details results of our study in Sect. 4. Then, we discuss our effects and the future research direction in Sect. 5. Finally, we make one conclusion of this paper in Sect. 6.

2 Methodology

Our proposed methodology is based on SHAP, which is the enhanced method of LIME. We first introduced the basic knowledge about the LIME method (Subsect. 2.1: Introduction of LIME) and the general kernel SHAP method (Subsect. 2.2: Introduction of kernel SHAP) to state the theory efficiency of our proposed methodology. Then, we explained our proposed cross-ensemble feature ranking methodology in Subsect. 2.3: cross-ensemble factor ranking.

2.1 Introduction of LIME

LIME is a concrete implementation of local surrogate models. Surrogate models are trained to approximate the predictions of the underlying model. Instead of training a global surrogate model, LIME focuses on training local surrogate models to explain individual predictions. LIME generates a new dataset consisting of perturbed samples and the corresponding predictions of the black box model. On this new dataset, LIME then trains an interpretable (generally linear) model, which is weighted by the proximity of the sampled instances to the instance of interest. Mathematically, local surrogate models with interpretability constraints can be expressed as follows:

$$\begin{aligned} \zeta (x) = arg\min _{g\in G} L\lbrace f, g, \pi _x \rbrace + \Omega \lbrace g\rbrace \end{aligned}$$

(1)

The explanation is defined as a model $g \in G$, where G is a class of potentially interpretable models, such as linear models, decision trees, and falling rule lists. The domain of g is $\lbrace 0,1 \rbrace ^{d}$, i.e., g acts over the absence of the interpretable components. Moreover, $\Omega (g)$ is the number of nonzero weights in linear models. The model being explained be denoted $f: {\mathbb {R}}^d \Rightarrow {\mathbb {R}}$. LIME use $\pi _x$ as a proximity measure between instance z to x to define locality around x. Finally, let $L\lbrace f, g, \pi _x \rbrace$ be a measure of how unfaithful g is in approximating ƒ in the locality defined by $\pi _x$. In order to ensure both interpretability and local fidelity, LIME minimizes $L\lbrace f, g, \pi _x \rbrace$ while having $\Omega (g)$ be low enough to be interpretable by humans.

LIME minimize the locality loss $L\lbrace f, g, \pi _x \rbrace$ without making any assumptions about f,i.e. model-agnostic. Thus, to learn the local behavior of f as the interpretable inputs vary, LIME approximates $L\lbrace f, g, \pi _x\rbrace$ by drawing samples, weighted by $\pi _x$. LIME samples instances around x by drawing nonzero elements of x uniformly at random (where the number of such draws is also uniformly sampled). Given a perturbed sample $z \in \lbrace 0, 1\rbrace ^{d }$(which contains a fraction of the nonzero elements of $x$), LIME recovers the sample in the original representation $z \in {\mathbb {R}}^d$and obtain $f(z)$, which is used as a label for the explanation model. Given this dataset $z$ of perturbed samples with the associated labels, LIME optimizes Eq. 1 to get an explanation $\zeta (x)$.

The steps for training local surrogate models in the LIME model is:

Select instances of factors that need to have an explanation of their black box prediction.
Perturb the dataset and get the black box predictions for these new points.
Weight the new samples according to their proximity to the nonzero elements of x uniformly at random
Train a weighted, interpretable model on the newly selected dataset with the variations.
Explain the prediction by interpreting the local model.

2.2 Introduction of kernel SHAP

The goal of SHAP is to explain the prediction of an instance x by computing each feature’s contribution to the prediction of one model. The SHAP explanation method computes Shapley’s values from coalitional game theory. The feature values x of a data instance act as players in a coalition. Shapley values tell us how to distribute the prediction among the features fairly. SHAP specifies the explanation as:

$$\begin{aligned} f(x) = g\left( z^\prime \right) = \phi _0 + \sum \limits _{i=1}^M \phi _i z_i^\prime \end{aligned}$$

(2)

where $z^\prime \in \lbrace 0,1\rbrace ^M$, M is the number of simplified input features, and $\phi _i \in {\mathbb {R}}$. The $z^\prime$ represents the dataset of $x$, and M has the same original feature space. In kernel SHAP, the $g\left( z^\prime \right)$ is linear model. Different from LIME methods, the explanation of $x$ becomes

$$\begin{aligned} \phi _i\left( f,x\right) = \sum \limits _{z^\prime \subseteq x } \pi _x (z^\prime ) \left[ f_x(z^\prime ) -f_x(z^\prime \setminus {i}) \right] \end{aligned}$$

(3)

where the $z^\prime$ is the represent data space for $x$. The $f_x(z^\prime )$ is the original function when the $z_i^\prime$ is 1, while the $f_x(z^\prime {\setminus }{i})$ is the original function when the $z_i^\prime$ is zero.

In the kernel SHAP method,

$$\begin{aligned} \Omega (g) = 0 \end{aligned}$$

(4)

and the kernel of $\pi _x$ becomes

$$\begin{aligned} \pi _x (Z^\prime ) = \frac{(M-1)}{\left( {\begin{array}{c}M\\ \vert Z^\prime \vert \end{array}}\right) \vert Z^\prime \vert \left( M- \vert Z^\prime \vert \right) } \end{aligned}$$

(5)

where the $\vert z^\prime \vert$ is the number of nonzero entries in $z^\prime$ and $z^\prime \subseteq x$ represents all $z^\prime$ vectors where the nonzero entries are a subset of the entries in $x$.

The explanation model $g(z^\prime )$ matches the original model $f(x)$ when $x=h_x(z^\prime )$, where $\phi _0=f(h_x(0))$ is set as 0. Therefore, the loss function of kernel SHAP becomes

$$\begin{aligned} L \left( {\widehat{f}}, g, \pi _x \right) = \sum _{z^\prime \in Z} \left[ {\widehat{f}}\left( h_x \left( z^\prime \right) \right) - g\left( z^\prime \right) \right] ^2 \pi _x \left( z^\prime \right) \end{aligned}$$

(6)

Kernel SHAP estimates the contribution of instance $x$ following five steps:

Sample coalitions $z_k^\prime \in \lbrace 0,1 \rbrace ^M$, $k\in \lbrace 1,\dots ,{\varvec{k}} \rbrace$, where M is the original feature space (1: feature present in a coalition, 0: feature absent)
Get prediction for each $z_i^\prime$ by first converting $z_i^\prime$ to the original feature space M and then applying model ${\widehat{f}}:{\widehat{f}}\left( h_x(z_i^\prime ) \right)$
Compute the weight for each $z_i^\prime$ with the SHAP kernel
Fit a weighted linear model
Return Shapley values $\phi _i$: the coefficients form the linear model.

Even though kernel SHAP can help us understand the factor contribution in each model, the value of factor ranking in each model is different (Fig. 2). Theoretically, because the kernel SHAP is based on the appropriate calculation theory, the kernel SHAP values of different methods will differ. Therefore, the single kernel SHAP value cannot stand the fundamental rank of factor importance, even though the kernel SHAP method can help us explain the models and even display the relations among elements. Specifically, critical factors’ importance ranking is vital in healthcare and medicine. Therefore, our proposed cross-ensemble factor importance ranking becomes necessary and meaningful.

2.3 Cross-ensemble factor ranking

From the introduction of kernel SHAP and the Eq. 6, we can find the approximated linear model just use the predicted output of one model (detail shown in 2.1), and the predicted output will be affected by how good the prediction model is. Meanwhile, because the model-agnostic explanation method only approximates the prediction outputs of models, the kernel SHAP values for all models have the same metric when we analyze one dataset. The results of our tests (Fig. 2) show that the factors’ ranking in each model changes when we use kernel SHAP to analyze one dataset. Therefore, a better factor contribution ranking method is needed, and our proposed methodology solves this problem.

Moreover, the goodness of one model also should be considered when calculating the factor’s importance. Thus, we also considered the accuracies of models to adjust the ranking of factors (Eq. 8). If one model has higher accuracy, it will be more critical in the final factor importance ranking. Therefore, we proposed the cross-ensemble factor contribution calculation method shown as Eq. 9.

$$\begin{aligned} I_j= \sum _{i=1}^{k}\phi _i \end{aligned}$$

(7)

The factor of importance is the addition/sum of all the local contributions.

$$\begin{aligned} W_j= & {} \frac{\exp (Acc_j)}{\sum _{j=1}^{N} \exp (Acc_j)} \end{aligned}$$

(8)

$$\begin{aligned} I= \sum _{j=1}^{N} W_j * I_j \end{aligned}$$

(9)

The $Acc_j$ is the accuracy of one classification or regression model, the N is the number of analytical approaches for one dataset, and the $I_j$ is the factor importance ranking in one analysis.

Our proposed methodology uses N approaches to analyze one dataset and gets the kernel SHAP values of each approach. Then the kernel SHAP values of N-1 approaches were chosen illiterately from N kinds approaches to calculate the ensemble factor importance using Eq. 9. Then, we used the average of all the N times ensemble iteration results as the final cross-ensemble factor importance ranking. In our proposed methodology, the performance (accuracy) of each model is considered by adding weight (Eq. 8) to the factor contribution ranking. The relatively high accuracy of one model was given higher weight in our factor ranking calculation.

Our proposed whole calculation flow is shown as Fig. 1. Firstly, nine general and robust machine learning classification models: logistic analysis, Navie Bayside classification, quantitative discriminate analysis (QDA), k-nearest neighbors classification, AdaBoost, general Decision Tree (DT), random forest classification, XGBoost, and Multi-Layer Perception(MLP) classification, were used to make a classification in five classification-task datasets. Furthermore, six regression methods were used for the regression-task dataset (household dataset). Then, for one dataset, the kernel SHAP was used to explain each model and got the contribution (global SHAP value) of factors for each model, while the importance ranking of factors was also reviewed (Fig. 2). After we got the kernel SHAP value of factors, we used our proposed methodology to calculate the final factor importance ranking. Finally, we compared our proposed methodology results with XGBoost results, which will be shown in part four.

3 Data source

Five open datasets and one non-open dataset (Table 1) were used to test our proposed methodology. Among the used data sets, four are for classification: Pima Indians Diabetes database (PIDD) [13] and Mendeley open diabetes data set [25], date fruit data [15] and the heart disease dataset [7], while the open household dataset [14] is for regression task. All the open datasets can be downloaded from the internet. The PIDD is a small diabetes dataset containing 768 diabetes samples and eight factors of diabetes. Similarly, the Mendeley open diabetes dataset contains 11 risk factors for diabetes: BMI, HbA1c, age, Etc., while the heart disease dataset has 17 factors. The housing dataset is one commonly used regression dataset containing nine factors. Our proposed methodology was tested in these classification and regression datasets. Moreover, the proposed method was also used to analyze the Ministry of Healthcare, Labor, and Welfare(MHLW) [11] census data hoping to identify some new possible associated risk factors for diabetes (Table 1). Because the MHLW dataset is one non-objective-orientated dataset, we used the newest MHLW (2018) data and deleted the null values samples. Finally, 12736 balanced samples were analyzed after data pre-processing.

In all the models, samples of the datasets are divided into the train (0.7) and test datasets (0.3). The performance of all the models is shown in Table.2. Then the kernel SHAP was used to explain all the models separately, and the factors’ importance ranking to each model was reviewed (Fig. 2). Then we used the proposed methodology to calculate the new factor ranking, whose results are shown in the results part. Moreover, our proposed methodology results were compared with the results of XGBoost, and the results are shown in Fig. 4.

Table 1 Describe of analyzed datasets in this study

Full size table

Table 2 Performance review of Classification models for the classification task datasets in our study

Full size table

4 Results

After we used kernel SHAP to explain models and tested our proposed methodology in six datasets, we can clearly understand the efficiency of our proposed methodology. Our results were divided into two parts. We first show the results of kernel SHAP, which shows the lousy performance of single kernel SHAP in Subsect. 4.1. Then we showed the results of our proposed ensemble SHAP values, which are shown in Subsect. 4.2. We also compared our methodology with the XGBoost method. The apparent difference is also shown in Subsect. 4.2.

4.1 Results of kernel SHAP

Even though SHAP can help us understand the factor contribution in each model, the results of single kernel SHAP (Fig. 2) show that the value of factor contribution and the orders of factors in each model are different. At the same time, the factors’ importance ranking is also different, especially for the typical certificated high-risk factors, age, and gender in the Mendeley diabetes dataset. Similarly, the factor contribution order in the PIDD dataset is also different, specifically, BMI, age, and diabetes pre-degree function. In PIDD dataset, as an essential risk factor for diabetes, age is the lowest contribution factor for diabetes in the MLP model and the second lowest contribution factor in DT and XGBoost. Similarly, the obesity factor in the MHLW dataset makes the lowest and second lowest contribution for diabetes in the DT and Logistic models. In contrast, it makes the highest contribution in the QDA model and the second significant contribution in the Naive Byes model. The other factors also have different importance ranks in each model in MHLW dataset. Moreover, room space, room numbers, and public pension become comparatively important factors in DT and XGBoost models. PIDD dataset, date fruit dataset, and household dataset have similar situations; the factors' orders in the classification or regression (household dataset) models are changing (Fig. 2). All the results of single kernel SHAP methods show that the single SHAP value cannot stand the natural factors’ important ranking, even though the kernel SHAP method can help us explain the models and even show the relations among elements. Specifically, knowing the factors’ ranking is essential in healthcare and medicine. Therefore, our proposed cross-ensemble factor importance calculation methodology will be instrumental.

4.2 Results of ensemble factors importance

After using our proposed methodology, the factor contribution ranking in each model became stable in all datasets ( Fig. 3). Moreover, we also compared our results with another robust feature ranking method: XGBoost. The results in Fig. 3 show that the final factor importance order of our proposed methodology is consistent with our human knowledge and is better than the XGBoost method.

Meanwhile, Fig. 4 shows that our proposed ensemble XAI (SHAP) methodology can explore the difference among factors, which gives the essential factors a high ranking while the less important factors in a lower ranking. Especially in the final ensemble factor importance for household data, only one factor (total bedrooms) is paramount, while the factors have nearly the same level of importance in XGBoost analysis. Similarly, compared with the results of XGBoost, the factors’ deviance in our proposed methodology is also more distinct in the other five datasets. Significantly, The factors’ importance ranking in the MHLW dataset shows some significant knowledge: the room space factor is more important than generally known factors: age, gender, and obesity. Similarly, alcohol drinking is more important than smoking. Meanwhile, the mental situation is more critical (Fig. 4) than the obesity factor in our analysis. This alarms the Japanese to pay attention to taking care of their mental situation to prevent diabetes.

5 Discussion

In this study, we compared the results of kernel SHAP for various machine learning models and found that the single SHAP model cannot explain the models at the human knowledge level. Then the ensemble XAI-based factor contribution ranking methodology was proposed, and its efficiency was certificated. Our results certificate that our proposed method solves the problem of unstable factor importance ranking problems in both the classification and regression models. Furthermore, in each tested dataset, the factors’ importance orders become stable during the factor importance ranking, which proves that our proposed methodology is efficient in calculating the factors’ importance ranking. Moreover, the fixed factors’ order is consistent with our ordinary human knowledge.

Compared with the general single kernel SHAP method, the proposed methodology offers comparatively stable and reliable factor importance ranking. Moreover, in some areas, the reliability factor importance supplies efficient guidance. Such as, among diabetes and heart disease datasets, besides the commonly known insulin and HbA1c factors, age is a third important factor in the Mendeley diabetes dataset; this shows us that older people should be more careful about their health. Similarly, in the PIDD dataset, BMI is more important than age, which tells us that preventing the spread of diabetes, inspiring people to care more about their health, and preventing obesity is urgent. Significantly, the ensemble XAI methods of the MHLW dataset show that Japanese citizens should pay more attention to their living conditions (house space, drinking, and smoking) and mental status to prevent diabetes.

Indeed, there are some limitations to our study. At present, only SHAP is used to combine the factor contribution. More reliable explanation methods will be used in future studies. Meanwhile, only the factor importance ranking problems were solved in our research. Therefore, a prospective study must also consider how to explain the factors' correlation efficiently. However, our proposed methodology can help us get a steady factor importance ranking, which can help us get comparatively reliable explainable results.

6 Conclusion and future work

This study compared the results of kernel SHAP for various machine learning models and found that the single SHAP model cannot explain the models at the human knowledge level. Then the ensemble XAI-based factor ranking methodology was proposed, and its efficiency was certificated in six datasets. Our proposed methodology solves unstable factor importance ranking problems in kernel SHAP and offers stable and more reliable factor importance ranking in classification and regression models. Our study paves the way for building reliable AI models. Furthermore, our study also identified some significant factors of diabetes. Our future studies and research will focus on building trustable AI models by using XAI methods identified knowledge. We also will explore the possibility of building knowledge-based small AI models.

Availability of data and materials

The non-open data used in this study need usage approval from the Ministry of Healthcare, Labor, and Welfare of Japan.

Code availability

Not applicable.

References

Alwadi M, Chetty G, Yamin M (2022) A framework for vehicle quality evaluation based on interpretable machine learning. Int J Inform Technol 15:1–8
Google Scholar
Bodria F, Giannotti F, Guidotti R, et al (2021) Benchmarking and survey of explanation methods for black box models. arXiv preprint arXiv:2102.13076
Van den Broeck G, Lykov A, Schleich M et al (2022) On the tractability of shap explanations. J Artif Intell Res 74:851–886
Article MathSciNet MATH Google Scholar
Chelgani SC, Nasiri H, Alidokht M (2021) Interpretable modeling of metallurgical responses for an industrial coal column flotation circuit by xgboost and shap-a “conscious-lab’’ development. Int J Min Sci Technol 31(6):1135–1144
Article Google Scholar
Chen H, Lundberg S, Lee SI (2021) Explaining Models by Propagating Shapley Values of Local Components. Stud Comput Intell 914:261–270. https://doi.org/10.1007/978-3-030-53352-6_24
Article Google Scholar
Covert I, Lundberg SM, Lee SI (2021) Explaining by removing: A unified framework for model explanation. J Mach Learn Res 22:209–1
MathSciNet MATH Google Scholar
for Disease Control C, Prevention (2020) Personal key indicators of heart disease. https://www.kaggle.com/datasets/kamilpytlak/personal-key-indicators-of-heart-disease
Durán JM, Jongsma KR (2021) Who is afraid of black box algorithms? on the epistemological and ethical basis of trust in medical ai. J Med Ethics 47(5):329–335
Google Scholar
Feng DC, Wang WJ, Mangalathu S et al (2021) Interpretable xgboost-shap machine-learning model for shear strength prediction of squat rc walls. J Struct Eng 147(11):04021
Article Google Scholar
Gupta S, Saini A (2021) An artificial intelligence based approach for managing risk of it systems in adopting cloud. Int J Inf Technol 13(6):2515–2523
Google Scholar
Ministry of Health L, of Japan W (2023) https://www.mhlw.go.jp/english/index.html
Jabeur SB, Mefteh-Wali S, Viviani JL (2021) Forecasting gold price with the xgboost algorithm and shap interaction values. Ann Oper Res. https://doi.org/10.1007/s10479-021-04187-w
Article Google Scholar
kaggle (2006) Pima indians diabetes database. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database
kaggle (2023) House rent prediction dataset. https://www.kaggle.com/datasets/iamsouravbanerjee/house-rent-prediction-dataset
Koklu M, Kursun R, Taspinar YS et al (2021) Classification of date fruits into genetic varieties using image analysis. Math Probl Eng 2021:1–13
Article Google Scholar
Li Y, Shen Y, Zhang W, et al (2021) Openbox: A generalized black-box optimization service. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp 3209–3219
Li Z (2022) Extracting spatial effects from machine learning model using local interpretation method: An example of shap and xgboost. Comput Environ Urban Syst 96(101):845
Google Scholar
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inform Process Syst 30
Lundberg SM, Nair B, Vavilala MS et al (2018) Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng 2(10):749–760. https://doi.org/10.1038/s41551-018-0304-0
Article Google Scholar
Meng Y, Yang N, Qian Z et al (2021) What makes an online review more helpful: An interpretation framework using xgboost and shap values. J Theor Appl Electron Commer Res 16(3):466–490. https://doi.org/10.3390/jtaer16030029
Article Google Scholar
Mitrentsis G, Lens H (2022) An interpretable probabilistic model for short-term solar power forecasting using natural gradient boosting. Appl Energy 309(118):473
Google Scholar
Molnar C (2022) Interpretable Machine Learning, 2nd edn. https://christophm.github.io/interpretable-ml-book
Nehal SA, Roy D, Devi M et al (2020) Highly sensitive lab-on-chip with deep learning ai for detection of bacteria in water. Int J Inf Technol 12(2):495–501
Google Scholar
Patil S, Patil KR, Patil CR et al (2020) Performance overview of an artificial intelligence in biomedics: a systematic approach. Int J Inf Technol 12(3):963–973
Google Scholar
Rashid A (2020) Diabetes dataset. https://doi.org/10.17632/wj9rwkp9c2.1, https://data.mendeley.com/datasets/wj9rwkp9c2/1
Ribeiro MT, Singh S, Guestrin C (2016) "Why should i trust you?" Explaining the predictions of any classifier. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol 13-17-August-2016. Association for Computing Machinery, pp 1135–1144, https://doi.org/10.1145/2939672.2939778
Sarwar A, Ali M, Manhas J et al (2020) Diagnosis of diabetes type-ii using hybrid machine learning based ensemble model. Int J Inf Technol 12(2):419–428
Google Scholar
Selvaraju RR, Cogswell M, Das A, et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Sokolovsky A, Arnaboldi L, Bacardit J, et al (2021) Explainable machine learning-driven strategy for automated trading pattern extraction. arXiv preprint arXiv:2103.12419
Wadden JJ (2022) Defining the undefinable: the black box problem in healthcare artificial intelligence. J Med Ethics 48(10):764–768
Article Google Scholar
Wang D, Thunéll S, Lindberg U et al (2022) Towards better process management in wastewater treatment plants: Process analytics based on shap values for tree-based machine learning methods. J Environ Manage 301(113):941
Google Scholar
Wang J, Wiens J, Lundberg S (2021) Shapley flow: A graph-based approach to interpreting model predictions. In: International Conference on Artificial Intelligence and Statistics, PMLR, pp 721–729
Wei CY, Luo H (2021) Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach. In: Conference on Learning Theory, PMLR, pp 4300–4354
Wen X, Xie Y, Wu L et al (2021) Quantifying and comparing the effects of key risk factors on various types of roadway segment crashes with lightgbm and shap. Accid Anal Prev 159(106):261
Google Scholar
Yang C, Chen M, Yuan Q (2021) The application of xgboost and shap to examining the factors in freight truck-related crashes: An exploratory analysis. Accid Analy Prev 158(106):153
Google Scholar
Zhao W, Joshi T, Nair VN, et al (2020) Shap values for explaining cnn-based text classification models. arXiv preprint arXiv:2008.11825
Zhao X, Huang W, Huang X, et al (2021) Baylime: Bayesian local interpretable model-agnostic explanations. In: Uncertainty in Artificial Intelligence, PMLR, pp 887–896

Download references

Funding

The authors declare that no funds and grants were received during the preparation of this manuscript. However, we thank the Ministry of Healthcare, Labor, and Welfare of Japan for supplying the anonymous data for our study.

Author information

Takashi Obi
Present address: , R2-60, Nagatsutachou 4259, Yokohama Midori Ward, Kanagawa, Japan

Authors and Affiliations

Course of Information and Communication, Department of Engineer, Tokyo Institute of Technology, Kanagawa, Japan
Pei Jiang
Center for Mathematics and Data Science, Gunma University, Showa-machi, Maebashi, 3-39-22, Gunma, Japan
Hiroyuki Suzuki
Institute of Innovative Research, Tokyo Institute of Technology, Kanagawa, Japan
Takashi Obi

Authors

Pei Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Hiroyuki Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Obi
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study’s conception and design. Data collection was performed by Hiroyuki Suzuki and Takashi Obi. Data analysis and the first draft of the manuscript were written by Pei Jiang, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Takashi Obi.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Ethical approval

The ethics of the Ministry of Healthcare, Labor, and Welfare of Japan approved this study for using anonymous data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jiang, P., Suzuki, H. & Obi, T. XAI-based cross-ensemble feature ranking methodology for machine learning models. Int. j. inf. tecnol. 15, 1759–1768 (2023). https://doi.org/10.1007/s41870-023-01270-2

Download citation

Received: 15 November 2022
Accepted: 11 April 2023
Published: 29 April 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s41870-023-01270-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

XAI-based cross-ensemble feature ranking methodology for machine learning models

Abstract

Similar content being viewed by others

Ensemble Feature Selection for Rankings of Features

Ensemble Features Selection Algorithm by Considering Features Ranking Priority

Rank Aggregation Approach to Feature Selection for Improved Model Performance

1 Introduction