Skip to main content
Log in

Feature importance measures for hydrological applications: insights from a virtual experiment

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

Discriminating the role of input variables in a hydrological system or in a multivariate hydrological study is particularly useful. Nowadays, emerging tools, called feature importance measures, are increasingly being applied in hydrological applications. In this study, we propose a virtual experiment to fully understand the functionality and, most importantly, the usefulness of these measures. Thirteen importance measures related to four general classes of methods are quantitatively evaluated to reproduce a benchmark importance ranking. This benchmark ranking is designed using a linear combination of ten random variables. Synthetic time series with varying distribution, cross-correlation, autocorrelation and random noise are simulated to mimic hydrological scenarios. The obtained results clearly suggest that a subgroup of three feature importance measures (Shapley-based feature importance, derivative-based measure, and permutation feature importance) generally provide reliable rankings and outperform the remaining importance measures, making them preferable in hydrological applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Apley D, Zhu J (2020) Visualizing the effects of predictor variables in black box supervised learning models. J Royal Stat Soc Ser B: Stat Methodol 82:1059–1086

    Article  Google Scholar 

  • Baucells M, Borgonovo E (2013) Invariant probabilistic sensitivity analysis. Manag Sci 59(11):2536–2549

    Article  Google Scholar 

  • Baucells M, Borgonovo E, Plischke E, et al (2021) Trend analysis in the age of machine learning

  • Borgonovo E (2007) A new uncertainty importance measure. Reliab Eng Syst Saf 92(6):771–784

    Article  Google Scholar 

  • Borgonovo E, Lu X, Plischke E et al (2017) Making the most out of a hydrological model data set: Sensitivity analyses to open the model black-box. Water Resour Res 53(9):7933–7950

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  • Cappelli F, Tauro F, Apollonio C et al (2022) Feature importance measures to dissect the role of sub-basins in shaping the catchment hydrological response: a proof of concept. Stoch Environ Res Risk Assess 37(4):1247–1264

    Article  Google Scholar 

  • Casalicchio G, Molnar C, Bischl B (2019) Visualizing the feature importance for black box models. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, pp 655–670

  • Cohen J, Cohen P, West SG et al (2013) Applied multiple regression/correlation analysis for the behavioral sciences. Routledge, Milton Park

    Book  Google Scholar 

  • Fox J (2015) Applied regression analysis and generalized linear models. Sage Publications, Thousand Oaks

    Google Scholar 

  • Gibbons JD, Chakraborti S (2011) Nonparametric statistical inference. In: International encyclopedia of statistical science. Springer, p 977–979

  • Greenwell BM, Boehmke BC, McCarthy AJ (2018) A simple and effective model-based variable importance measure. arXiv preprint arXiv:1805.04755

  • Hastie T, Tibshirani R, Friedman JH et al (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, New York

    Book  Google Scholar 

  • Havlicek LL, Peterson NL (1977) Effect of the violation of assumptions upon significance levels of the Pearson r. Psychol Bull 84(2):373

    Article  Google Scholar 

  • Hooker G, Mentch L (2019) Please stop permuting features: an explanation and alternatives. arXiv e-prints pp arXiv–1905

  • Hooker G, Mentch L, Zhou S (2021) Unrestricted permutation forces extrapolation: variable importance requires at least one more model, or there is no free variable importance. Stat Comput 31(6):1–16

    Article  Google Scholar 

  • Iman RL, Conover W (1987) A measure of top-down correlation. Technometrics 29(3):351–357

    Google Scholar 

  • Iman RL, Hora SC (1990) A robust measure of uncertainty importance for use in fault tree system analysis. Risk Anal 10:401–406

    Article  Google Scholar 

  • James G, Witten D, Hastie T et al (2013) An introduction to statistical learning, vol 112. Springer, New York

    Book  Google Scholar 

  • Kendall MG (1938) A new measure of rank correlation. Biometrika 30(1/2):81–93

    Article  Google Scholar 

  • Li B, Yang G, Wan R et al (2016) Comparison of random forests and other statistical methods for the prediction of lake water level: a case study of the poyang lake in china. Hydrol Res 47(S1):69–83

    Article  Google Scholar 

  • Li H, Ameli A (2022) A statistical approach for identifying factors governing streamflow recession behaviour. Hydrolo Process 36(10):e14718

    Article  Google Scholar 

  • Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30

  • Mohr CH, Manga M, Wang CY et al (2017) Regional changes in streamflow after a megathrust earthquake. Earth Planet Sci Lett 458:418–428

    Article  CAS  Google Scholar 

  • Molnar C (2020) Interpretable machine learning. Lulu. com

  • Pearson K (1905) On the general theory of skew correlation and non-linear regression, mathematical contributions to the theory of evolution, Drapers’ company research memoirs, vol XIV. Dulau & Co., London

    Google Scholar 

  • Plischke E, Borgonovo E, Smith CL (2013) Global sensitivity measures from given data. Eur J Oper Res 226(3):536–550

    Article  Google Scholar 

  • Razavi S, Gupta HV (2016) A new framework for comprehensive, robust, and efficient global sensitivity analysis: 1. Theory. Water Resour Res 52(1):423–439

    Article  Google Scholar 

  • Razavi S, Jakeman A, Saltelli A et al (2021) The future of sensitivity analysis: an essential discipline for systems modeling and policy support. Environ Model Softw 137(104):954

    Google Scholar 

  • Ribeiro MT, Singh S, Guestrin C (2016) “ why should i trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135–1144

  • Saltelli A (2002) Making best use of model evaluations to compute sensitivity indices. Comput Phys Commun 145(2):280–297

    Article  CAS  Google Scholar 

  • Saltelli A, Ratto M, Andres T et al (2008) Global sensitivity analysis: the primer. John Wiley & Sons, Hoboken

    Google Scholar 

  • Savage IR (1957) Contributions to the theory of rank order statistics-the “trend’’ case. Ann Math Stat 28(4):968–977

    Article  Google Scholar 

  • Schmidt L, Heße F, Attinger S et al (2020) Challenges in applying machine learning models for hydrological inference: a case study for flooding events across Germany. Water Resour Res 56(5):e2019WR025924

    Article  Google Scholar 

  • Shapley L (1952) A value for n-person games. Ann Math Stud Study 28:307–317

    Google Scholar 

  • Song X, Zhang J, Zhan C et al (2015) Global sensitivity analysis in hydrological modeling: review of concepts, methods, theoretical framework, and applications. J Hydrol 523:739–757

    Article  Google Scholar 

  • Spearman C (1961) The proof and measurement of association between two things

  • Strobl C, Boulesteix AL, Zeileis A et al (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8(1):1–21

    Article  Google Scholar 

  • Štrumbelj E, Kononenko I (2014) Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst 41:647–665

    Article  Google Scholar 

  • Team RC (2013) R: a language and environment for statistical computing. r foundation for statistical computing. Vienna, Austria ISBN 3-900051-07-0, http://wwwR-projectorg/ 30

  • Venables W, Ripley B (2002) Modern applied statistics with S fourth edition by, world

  • Wang S, Peng H, Hu Q et al (2022) Analysis of runoff generation driving factors based on hydrological model and interpretable machine learning method. J Hydrol: Reg Stud 42(101):139

    CAS  Google Scholar 

  • Weisberg S (2005) Applied linear regression, vol 528. John Wiley & Sons, Hoboken

    Book  Google Scholar 

Download references

Acknowledgements

This research has received no external funding.

Author information

Authors and Affiliations

Authors

Contributions

Both authors performed the experiments, analyzed the data and wrote the manuscript. Both authors reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Francesco Cappelli.

Ethics declarations

Conflict of interests

The authors declare no conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

A Feature importance measures in detail

We introduce here the notation and the main notions useful for defining the measures used in this work. Let Y and \(X=(X_1,\dots ,X_d)\) be a random variable and a random vector on the probability space \((\Omega ,{\mathcal {B}}(\Omega ), {\mathbb {P}})\), with \(X \in {\mathcal {X}}_d \subseteq {\mathbb {R}}^d\), \(Y \in {\mathcal {Y}}_Y \subseteq {\mathbb {R}}\), the cumulative distribution function \({\mathbb {P}}_X\) and \({\mathbb {P}}_Y\), and respective density functions \(p_X\) and \(p_Y\). The vector \({\textbf{X}}\) can be written as \(\textbf{X}=(X_i,{\textbf{X}}_{-i})\), where \(\textbf{X}_{-i}=\{X_l,l=1,\dots ,d,l\ne i\}\). We denote the corresponding observed value of \(X_i\) as \({\textbf{x}}=\{x_i^{(1)},...,x_i^{(N)}\}'\) and the corresponding jth observation as \(\textbf{x}^{(j)}=\{x_1^{(j)},...,x_d^{(j)}\}\) associated with the corresponding target value \(y^{(j)} \in {\mathcal {Y}}\). For the computation of the ML feature importance measure, we adopt an ML model \({\widehat{g}}\) (a linear model) to approximate the unknown model. The chosen ML model is fitted on the training set \(\{(\textbf{x}^{(j)},y^{(j)})\}_{j=1}^N\). We use \(L({\widehat{g}})=\mathbb E[{\mathcal {L}}(Y,{\widehat{g}}({\textbf{X}}))]\) to refer to the generalization error of a trained ML model, where \(\mathcal L:Y\times {\mathbb {R}}^d \rightarrow {\mathbb {R}}_{+}\) is the loss function. The notion of Shapley value arises from cooperative game theory. Given a group of players \(D=(1,\dots ,d)\) who can join the coalitions \(K\subseteq D\). The total number of possible coalitions is \(2^d\). We denote by \(v:2^d\rightarrow {\mathbb {R}}_{+}\) the value function that assigns the reward to coalitions. The reward of player i is given by

$$\begin{aligned} \phi _i(v)=\sum _{K\subseteq D \setminus \{i\}}\frac{|K|!(|D|-|K|-1)!}{|D|!}[ v(K\cup {i})-v(K)]. \end{aligned}$$
(5)

Regarding the ALE function, its estimation requires to partition the support of the feature of interest in K not overlapping intervals \({\mathcal {X}}_i^k=[z_k,z_{k-1}]\) with \(k=1,\dots ,K\). The estimate of the ALE function of feature \(X_i\) can be written as

$$\begin{aligned} \widehat{ALE}_{j}(x_{j})=\sum _{k=1}^{K}\frac{1}{N_{i}^k} \sum _{j: \textbf{x}_{j}\in {\mathcal {X}}_i^k}\left[ \widehat{g}\left(z_{i}^k,\textbf{x}_{-i}^{(j)}\right)- \widehat{g}\left(z_{i}^{k-1},\textbf{x}_{-i}^{(j)}\right)\right] , \end{aligned}$$
(6)

for each \(x_i \in \left[z_{j}^0,z_{j}^K\right]\), where \(z_{i}^0=\min \left\{x_i^{(1)},...,x_i^{(N)}\right\}\) and \(z_{i}^K=\max \left\{x_i^{(1)},...,x_i^{(N)}\right\}\). The term \(N_i^k\) denotes the number of data points in the kth interval for the feature \(X_i\). We now report the feature importance measures employed in our analysis.

  1. 1.

    Correlation methods

    • Pearson correlation coefficient \(r_i\) of feature \(X_i\) is defined as follows

      $$\begin{aligned} r_i= & {} \frac{{\mathbb {E}}(YX_i)-\mathbb E(X_i)E(Y)}{\sqrt{E(X_i^2)-({\mathbb {E}}(X_i))^2}\sqrt{\mathbb E(Y^2)-({\mathbb {E}}(Y))^2}}\nonumber \\{} & {} = \frac{\text {cov}(Y, X_i)}{\sigma _{X_i}\sigma _{Y}}, \end{aligned}$$
      (7)

      where \(\text {cov}(Y,X_i)\) denotes the covariance between Y and \(X_i\), while \(\sigma _{X_i}\) and \(\sigma _Y\) denote the standard deviations of \(X_i\) and Y, respectively.

    • Spearman’s rank correlation coefficient \(\rho _i\) of feature \(X_i\) can be written as

      $$\begin{aligned} \rho _i=\frac{\text {cov}(\text {Rank}(Y), \text {Rank}(X_i))}{\sigma _{\text {Rank}(X_i)}\sigma _{\text {Rank}(Y)}}. \end{aligned}$$
      (8)

      Note that the formula above is similar to that of Pearson’s coefficient, but it is applied to rank features.

    • Kendall rank correlation coefficient \(\tau _i\) of feature \(X_i\) can be written as

      $$\begin{aligned} \tau _i=\frac{n_c-n_d}{\sqrt{n_0-n_c}\sqrt{n_0-n_c}}, \end{aligned}$$
      (9)

      where \(n_c\) is the number of concordant pairs, \(n_d\) is the number of discordant pairs, \(n_0\) is the total number of data pairs and \(n_1\) is the total number of data pairs with different values for both features.

  2. 2.

    Regression-based methods

    • Standardized regression coefficient SCR\(_i\) of the feature \(X_i\) is defined as

      $$\begin{aligned} \text {SCR}_i=\frac{\beta _i\sigma _{X_i}}{\sigma _Y}, \end{aligned}$$
      (10)

      where \(\beta _i\) is the coefficient of \(X_i\) in the linear model.

  3. 3.

    ML feature importance measures

    • Permutation feature importance PFI\(_i\) of feature \(X_i\) is defined as

      $$\begin{aligned} \text {PFI}_i = {\mathbb {E}}[{\mathcal {L}}(Y,{\widehat{g}}(X_i^{\pi }, \textbf{X}_{-i})]-{\mathbb {E}}[{\mathcal {L}}(Y,{\widehat{g}}(X_i, \textbf{X}_{-i})], \end{aligned}$$
      (11)

      where \(X_i^\pi\) has the same marginal distribution of \(X_i\) and \({\mathbb {E}}[\cdot ]\) is the expectation operator.

    • Permute and Relearn feature importance \(\text {VI}_j^{\pi \text {L}}\) of feature \(X_i\) is defined as

      $$\begin{aligned} \text {VI}_i^{\pi \text {L}}= {\mathbb {E}}[{\mathcal {L}}(Y, \widehat{g}(X_i, \textbf{X}_{-i}))]-{\mathbb {E}}[{\mathcal {L}}(Y, \widehat{g}^{\pi ,i}(X_i, \textbf{X}_{-i}))], \end{aligned}$$
      (12)

      where \({\widehat{g}}^{i, \pi }\) is the ML model retrained after the permutation of \(X_i\).

    • Shapley-based feature importance \(\text {SbFI}_i\) of feature \(X_i\) is defined as

      $$\begin{aligned} \text {SbFI}_i= & {} \sum _{K\subseteq D \setminus \{i\}}\frac{|K|!(|D|-|K|-1)!}{|D|!}\nonumber \\{} & {} [{\widehat{g}}_K(\textbf{x}_{K\cup {i}})-g_K(\textbf{x}_{K})], \end{aligned}$$
      (13)

      where \({\widehat{g}}_K ({\textbf{x}}_K )=E_{{\textbf{X}}_C } [\widehat{g}(x_K,{\textbf{X}}_C )]\), with C being the complement of K.

    • Shapley feature importance \(\text {SFIMP}_i\) of feature \(X_i\) is defined using a value function based on the model generalization error, i.e.,

      $$\begin{aligned} \text {SFIMP}_i= &{} \sum _{K\subseteq D \setminus \{i\}}\frac{|K|!(|D|-|K|-1)!}{|D|!}\nonumber \\{} & {} \left[{\widehat{L}}_{K\cup {i}}({\widehat{g}})- \widehat{L}_{K}({\widehat{g}})\right], \end{aligned}$$
      (14)

      where \({\widehat{L}}_{K}({\widehat{g}})=\frac{1}{N}\sum _{j=1}^N \sum _{k=j}^N {\mathcal {L}}\left({\widehat{g}}( x_K^{(j)}, {\textbf{x}}_C^{(k)}), y^{(j)}\right)\).

    • Derivative-based measure \(\kappa _i^{\text {ALE}}\) of feature \(X_i\) is defined as

      $$\begin{aligned} \kappa _i^{\text {ALE}}(x_i)= & {} \frac{1}{K}\sum _k {\mathbb {E}}\nonumber \\{} & {} \left[ \frac{{\widehat{g}}\left(z_i^k, \textbf{x}_{-i}^{(j)}\right)-\widehat{g}\left(z_i^{k-1}, \textbf{x}_{-i}^{(j)}\right)}{z_i^k-z_i^{k-1}}\right] ^2\frac{\sigma _{X_i}^2}{\sigma _Y^2}. \end{aligned}$$
      (15)
    • ALE-based feature importance \(\text {FI}^{\text {ALE}}\) of feature \(X_i\) is defined as

      $$\begin{aligned} \text {FI}^{\text {ALE}}_i= \sqrt{{\mathbb {V}}(\text {ALE}_i(X_i))}, \end{aligned}$$
      (16)

      where \({\mathbb {V}}[\cdot ]\) denotes the variance operator.

  4. 4.

    SA methods

    • Variance-based sensitivity measure of feature \(X_i\) is defined as

      $$\begin{aligned} \eta _i^2=\frac{{\mathbb {V}}[Y] -{\mathbb {E}}_{\textbf{X}_{-i}}[{\mathbb {V}}_{X_i}[Y \mid X_i]]}{{\mathbb {V}}[Y]}. \end{aligned}$$
      (17)
    • Density-based sensitivity measure \(\delta _i\) of feature \(X_i\) can be written as

      $$\begin{aligned} \delta _i=\frac{1}{2}{\mathbb {E}}_{X_i}\left[ \int _{{\mathcal {Y}}} |p_Y(y)-p_{Y\mid X_i}(y) |dy\right] , \end{aligned}$$
      (18)

      where \(p_Y\) and \(p_{Y\mid X_i}\) are the marginal output density and the conditional density, respectively.

    • Cumulative distribution-based sensitivity measure \(\beta ^{\text {KS}}_i\) of feature \(X_i\) can be written as

      $$\begin{aligned} \beta ^{\text {KS}}_i={\mathbb {E}}_{X_i}\left[ \sup _{{\mathcal {Y}}}\left|{\mathbb {P}}_Y(y)-{\mathbb {P}}_{Y\mid X_i }(y)\right|dy\right] , \end{aligned}$$
      (19)

      where \({\mathbb {P}}_Y\) and \({\mathbb {P}}_{Y\mid X_i}\) are the cdfs.

B Tables

In this Section, we report the mean values of the CPI index calculated on the 100 simulations generated for the 30 case studies with \(N =\) 100, 1000, and 10000 for the four classes of methods used.

Table 8 CPI estimates for all case studies reported in Table 2 for \(N=\) 100, 1000 and 10000 using the Correlation Methods and Regression-based method
Table 9 CPI estimates for all case studies reported in Table 2 for \(N=\) 100, 1000 and 10000 using the ML feature importance measures
Table 10 CPI estimates for all case studies reported in Table 2 for \(N=\) 100, 1000 and 10000 using the SA measures

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cappelli, F., Grimaldi, S. Feature importance measures for hydrological applications: insights from a virtual experiment. Stoch Environ Res Risk Assess 37, 4921–4939 (2023). https://doi.org/10.1007/s00477-023-02545-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-023-02545-7

Keywords

Navigation