Abstract
Discriminating the role of input variables in a hydrological system or in a multivariate hydrological study is particularly useful. Nowadays, emerging tools, called feature importance measures, are increasingly being applied in hydrological applications. In this study, we propose a virtual experiment to fully understand the functionality and, most importantly, the usefulness of these measures. Thirteen importance measures related to four general classes of methods are quantitatively evaluated to reproduce a benchmark importance ranking. This benchmark ranking is designed using a linear combination of ten random variables. Synthetic time series with varying distribution, cross-correlation, autocorrelation and random noise are simulated to mimic hydrological scenarios. The obtained results clearly suggest that a subgroup of three feature importance measures (Shapley-based feature importance, derivative-based measure, and permutation feature importance) generally provide reliable rankings and outperform the remaining importance measures, making them preferable in hydrological applications.
Similar content being viewed by others
References
Apley D, Zhu J (2020) Visualizing the effects of predictor variables in black box supervised learning models. J Royal Stat Soc Ser B: Stat Methodol 82:1059–1086
Baucells M, Borgonovo E (2013) Invariant probabilistic sensitivity analysis. Manag Sci 59(11):2536–2549
Baucells M, Borgonovo E, Plischke E, et al (2021) Trend analysis in the age of machine learning
Borgonovo E (2007) A new uncertainty importance measure. Reliab Eng Syst Saf 92(6):771–784
Borgonovo E, Lu X, Plischke E et al (2017) Making the most out of a hydrological model data set: Sensitivity analyses to open the model black-box. Water Resour Res 53(9):7933–7950
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Cappelli F, Tauro F, Apollonio C et al (2022) Feature importance measures to dissect the role of sub-basins in shaping the catchment hydrological response: a proof of concept. Stoch Environ Res Risk Assess 37(4):1247–1264
Casalicchio G, Molnar C, Bischl B (2019) Visualizing the feature importance for black box models. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 655–670
Cohen J, Cohen P, West SG et al (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, Milton Park
Fox J (2015) Applied regression analysis and generalized linear models. Sage Publications, Thousand Oaks
Gibbons JD, Chakraborti S (2011) Nonparametric statistical inference. In: International encyclopedia of statistical science. Springer, p 977–979
Greenwell BM, Boehmke BC, McCarthy AJ (2018) A simple and effective model-based variable importance measure. arXiv preprint arXiv:1805.04755
Hastie T, Tibshirani R, Friedman JH et al (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, New York
Havlicek LL, Peterson NL (1977) Effect of the violation of assumptions upon significance levels of the Pearson r. Psychol Bull 84(2):373
Hooker G, Mentch L (2019) Please stop permuting features: an explanation and alternatives. arXiv e-prints pp arXiv–1905
Hooker G, Mentch L, Zhou S (2021) Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance. Stat Comput 31(6):1–16
Iman RL, Conover W (1987) A measure of top-down correlation. Technometrics 29(3):351–357
Iman RL, Hora SC (1990) A robust measure of uncertainty importance for use in fault tree system analysis. Risk Anal 10:401–406
James G, Witten D, Hastie T et al (2013) An introduction to statistical learning, vol 112. Springer, New York
Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93
Li B, Yang G, Wan R et al (2016) Comparison of random forests and other statistical methods for the prediction of lake water level: a case study of the poyang lake in china. Hydrol Res 47(S1):69–83
Li H, Ameli A (2022) A statistical approach for identifying factors governing streamflow recession behaviour. Hydrolo Process 36(10):e14718
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30
Mohr CH, Manga M, Wang CY et al (2017) Regional changes in streamflow after a megathrust earthquake. Earth Planet Sci Lett 458:418–428
Molnar C (2020) Interpretable machine learning. Lulu. com
Pearson K (1905) On the general theory of skew correlation and non-linear regression, mathematical contributions to the theory of evolution, Drapers’ company research memoirs, vol XIV. Dulau & Co., London
Plischke E, Borgonovo E, Smith CL (2013) Global sensitivity measures from given data. Eur J Oper Res 226(3):536–550
Razavi S, Gupta HV (2016) A new framework for comprehensive, robust, and efficient global sensitivity analysis: 1. Theory. Water Resour Res 52(1):423–439
Razavi S, Jakeman A, Saltelli A et al (2021) The future of sensitivity analysis: an essential discipline for systems modeling and policy support. Environ Model Softw 137(104):954
Ribeiro MT, Singh S, Guestrin C (2016) “ why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144
Saltelli A (2002) Making best use of model evaluations to compute sensitivity indices. Comput Phys Commun 145(2):280–297
Saltelli A, Ratto M, Andres T et al (2008) Global sensitivity analysis: the primer. John Wiley & Sons, Hoboken
Savage IR (1957) Contributions to the theory of rank order statistics-the “trend’’ case. Ann Math Stat 28(4):968–977
Schmidt L, Heße F, Attinger S et al (2020) Challenges in applying machine learning models for hydrological inference: a case study for flooding events across Germany. Water Resour Res 56(5):e2019WR025924
Shapley L (1952) A value for n-person games. Ann Math Stud Study 28:307–317
Song X, Zhang J, Zhan C et al (2015) Global sensitivity analysis in hydrological modeling: review of concepts, methods, theoretical framework, and applications. J Hydrol 523:739–757
Spearman C (1961) The proof and measurement of association between two things
Strobl C, Boulesteix AL, Zeileis A et al (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8(1):1–21
Štrumbelj E, Kononenko I (2014) Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst 41:647–665
Team RC (2013) R: a language and environment for statistical computing. r foundation for statistical computing. Vienna, Austria ISBN 3-900051-07-0, http://wwwR-projectorg/ 30
Venables W, Ripley B (2002) Modern applied statistics with S fourth edition by, world
Wang S, Peng H, Hu Q et al (2022) Analysis of runoff generation driving factors based on hydrological model and interpretable machine learning method. J Hydrol: Reg Stud 42(101):139
Weisberg S (2005) Applied linear regression, vol 528. John Wiley & Sons, Hoboken
Acknowledgements
This research has received no external funding.
Author information
Authors and Affiliations
Contributions
Both authors performed the experiments, analyzed the data and wrote the manuscript. Both authors reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of interests
The authors declare no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
A Feature importance measures in detail
We introduce here the notation and the main notions useful for defining the measures used in this work. Let Y and \(X=(X_1,\dots ,X_d)\) be a random variable and a random vector on the probability space \((\Omega ,{\mathcal {B}}(\Omega ), {\mathbb {P}})\), with \(X \in {\mathcal {X}}_d \subseteq {\mathbb {R}}^d\), \(Y \in {\mathcal {Y}}_Y \subseteq {\mathbb {R}}\), the cumulative distribution function \({\mathbb {P}}_X\) and \({\mathbb {P}}_Y\), and respective density functions \(p_X\) and \(p_Y\). The vector \({\textbf{X}}\) can be written as \(\textbf{X}=(X_i,{\textbf{X}}_{-i})\), where \(\textbf{X}_{-i}=\{X_l,l=1,\dots ,d,l\ne i\}\). We denote the corresponding observed value of \(X_i\) as \({\textbf{x}}=\{x_i^{(1)},...,x_i^{(N)}\}'\) and the corresponding jth observation as \(\textbf{x}^{(j)}=\{x_1^{(j)},...,x_d^{(j)}\}\) associated with the corresponding target value \(y^{(j)} \in {\mathcal {Y}}\). For the computation of the ML feature importance measure, we adopt an ML model \({\widehat{g}}\) (a linear model) to approximate the unknown model. The chosen ML model is fitted on the training set \(\{(\textbf{x}^{(j)},y^{(j)})\}_{j=1}^N\). We use \(L({\widehat{g}})=\mathbb E[{\mathcal {L}}(Y,{\widehat{g}}({\textbf{X}}))]\) to refer to the generalization error of a trained ML model, where \(\mathcal L:Y\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}_{+}\) is the loss function. The notion of Shapley value arises from cooperative game theory. Given a group of players \(D=(1,\dots ,d)\) who can join the coalitions \(K\subseteq D\). The total number of possible coalitions is \(2^d\). We denote by \(v:2^d\rightarrow {\mathbb {R}}_{+}\) the value function that assigns the reward to coalitions. The reward of player i is given by
Regarding the ALE function, its estimation requires to partition the support of the feature of interest in K not overlapping intervals \({\mathcal {X}}_i^k=[z_k,z_{k-1}]\) with \(k=1,\dots ,K\). The estimate of the ALE function of feature \(X_i\) can be written as
for each \(x_i \in \left[z_{j}^0,z_{j}^K\right]\), where \(z_{i}^0=\min \left\{x_i^{(1)},...,x_i^{(N)}\right\}\) and \(z_{i}^K=\max \left\{x_i^{(1)},...,x_i^{(N)}\right\}\). The term \(N_i^k\) denotes the number of data points in the kth interval for the feature \(X_i\). We now report the feature importance measures employed in our analysis.
-
1.
Correlation methods
-
Pearson correlation coefficient \(r_i\) of feature \(X_i\) is defined as follows
$$\begin{aligned} r_i= & {} \frac{{\mathbb {E}}(YX_i)-\mathbb E(X_i)E(Y)}{\sqrt{E(X_i^2)-({\mathbb {E}}(X_i))^2}\sqrt{\mathbb E(Y^2)-({\mathbb {E}}(Y))^2}}\nonumber \\{} & {} = \frac{\text {cov}(Y, X_i)}{\sigma _{X_i}\sigma _{Y}}, \end{aligned}$$(7)where \(\text {cov}(Y,X_i)\) denotes the covariance between Y and \(X_i\), while \(\sigma _{X_i}\) and \(\sigma _Y\) denote the standard deviations of \(X_i\) and Y, respectively.
-
Spearman’s rank correlation coefficient \(\rho _i\) of feature \(X_i\) can be written as
$$\begin{aligned} \rho _i=\frac{\text {cov}(\text {Rank}(Y), \text {Rank}(X_i))}{\sigma _{\text {Rank}(X_i)}\sigma _{\text {Rank}(Y)}}. \end{aligned}$$(8)Note that the formula above is similar to that of Pearson’s coefficient, but it is applied to rank features.
-
Kendall rank correlation coefficient \(\tau _i\) of feature \(X_i\) can be written as
$$\begin{aligned} \tau _i=\frac{n_c-n_d}{\sqrt{n_0-n_c}\sqrt{n_0-n_c}}, \end{aligned}$$(9)where \(n_c\) is the number of concordant pairs, \(n_d\) is the number of discordant pairs, \(n_0\) is the total number of data pairs and \(n_1\) is the total number of data pairs with different values for both features.
-
-
2.
Regression-based methods
-
Standardized regression coefficient SCR\(_i\) of the feature \(X_i\) is defined as
$$\begin{aligned} \text {SCR}_i=\frac{\beta _i\sigma _{X_i}}{\sigma _Y}, \end{aligned}$$(10)where \(\beta _i\) is the coefficient of \(X_i\) in the linear model.
-
-
3.
ML feature importance measures
-
Permutation feature importance PFI\(_i\) of feature \(X_i\) is defined as
$$\begin{aligned} \text {PFI}_i = {\mathbb {E}}[{\mathcal {L}}(Y,{\widehat{g}}(X_i^{\pi }, \textbf{X}_{-i})]-{\mathbb {E}}[{\mathcal {L}}(Y,{\widehat{g}}(X_i, \textbf{X}_{-i})], \end{aligned}$$(11)where \(X_i^\pi\) has the same marginal distribution of \(X_i\) and \({\mathbb {E}}[\cdot ]\) is the expectation operator.
-
Permute and Relearn feature importance \(\text {VI}_j^{\pi \text {L}}\) of feature \(X_i\) is defined as
$$\begin{aligned} \text {VI}_i^{\pi \text {L}}= {\mathbb {E}}[{\mathcal {L}}(Y, \widehat{g}(X_i, \textbf{X}_{-i}))]-{\mathbb {E}}[{\mathcal {L}}(Y, \widehat{g}^{\pi ,i}(X_i, \textbf{X}_{-i}))], \end{aligned}$$(12)where \({\widehat{g}}^{i, \pi }\) is the ML model retrained after the permutation of \(X_i\).
-
Shapley-based feature importance \(\text {SbFI}_i\) of feature \(X_i\) is defined as
$$\begin{aligned} \text {SbFI}_i= & {} \sum _{K\subseteq D \setminus \{i\}}\frac{|K|!(|D|-|K|-1)!}{|D|!}\nonumber \\{} & {} [{\widehat{g}}_K(\textbf{x}_{K\cup {i}})-g_K(\textbf{x}_{K})], \end{aligned}$$(13)where \({\widehat{g}}_K ({\textbf{x}}_K )=E_{{\textbf{X}}_C } [\widehat{g}(x_K,{\textbf{X}}_C )]\), with C being the complement of K.
-
Shapley feature importance \(\text {SFIMP}_i\) of feature \(X_i\) is defined using a value function based on the model generalization error, i.e.,
$$\begin{aligned} \text {SFIMP}_i= &{} \sum _{K\subseteq D \setminus \{i\}}\frac{|K|!(|D|-|K|-1)!}{|D|!}\nonumber \\{} & {} \left[{\widehat{L}}_{K\cup {i}}({\widehat{g}})- \widehat{L}_{K}({\widehat{g}})\right], \end{aligned}$$(14)where \({\widehat{L}}_{K}({\widehat{g}})=\frac{1}{N}\sum _{j=1}^N \sum _{k=j}^N {\mathcal {L}}\left({\widehat{g}}( x_K^{(j)}, {\textbf{x}}_C^{(k)}), y^{(j)}\right)\).
-
Derivative-based measure \(\kappa _i^{\text {ALE}}\) of feature \(X_i\) is defined as
$$\begin{aligned} \kappa _i^{\text {ALE}}(x_i)= & {} \frac{1}{K}\sum _k {\mathbb {E}}\nonumber \\{} & {} \left[ \frac{{\widehat{g}}\left(z_i^k, \textbf{x}_{-i}^{(j)}\right)-\widehat{g}\left(z_i^{k-1}, \textbf{x}_{-i}^{(j)}\right)}{z_i^k-z_i^{k-1}}\right] ^2\frac{\sigma _{X_i}^2}{\sigma _Y^2}. \end{aligned}$$(15) -
ALE-based feature importance \(\text {FI}^{\text {ALE}}\) of feature \(X_i\) is defined as
$$\begin{aligned} \text {FI}^{\text {ALE}}_i= \sqrt{{\mathbb {V}}(\text {ALE}_i(X_i))}, \end{aligned}$$(16)where \({\mathbb {V}}[\cdot ]\) denotes the variance operator.
-
-
4.
SA methods
-
Variance-based sensitivity measure of feature \(X_i\) is defined as
$$\begin{aligned} \eta _i^2=\frac{{\mathbb {V}}[Y] -{\mathbb {E}}_{\textbf{X}_{-i}}[{\mathbb {V}}_{X_i}[Y \mid X_i]]}{{\mathbb {V}}[Y]}. \end{aligned}$$(17) -
Density-based sensitivity measure \(\delta _i\) of feature \(X_i\) can be written as
$$\begin{aligned} \delta _i=\frac{1}{2}{\mathbb {E}}_{X_i}\left[ \int _{{\mathcal {Y}}} |p_Y(y)-p_{Y\mid X_i}(y) |dy\right] , \end{aligned}$$(18)where \(p_Y\) and \(p_{Y\mid X_i}\) are the marginal output density and the conditional density, respectively.
-
Cumulative distribution-based sensitivity measure \(\beta ^{\text {KS}}_i\) of feature \(X_i\) can be written as
$$\begin{aligned} \beta ^{\text {KS}}_i={\mathbb {E}}_{X_i}\left[ \sup _{{\mathcal {Y}}}\left|{\mathbb {P}}_Y(y)-{\mathbb {P}}_{Y\mid X_i }(y)\right|dy\right] , \end{aligned}$$(19)where \({\mathbb {P}}_Y\) and \({\mathbb {P}}_{Y\mid X_i}\) are the cdfs.
-
B Tables
In this Section, we report the mean values of the CPI index calculated on the 100 simulations generated for the 30 case studies with \(N =\) 100, 1000, and 10000 for the four classes of methods used.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cappelli, F., Grimaldi, S. Feature importance measures for hydrological applications: insights from a virtual experiment. Stoch Environ Res Risk Assess 37, 4921–4939 (2023). https://doi.org/10.1007/s00477-023-02545-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-023-02545-7