Total Effects with Constrained Features

,


Introduction
Determining feature importance is a crucial task in machine learning and statistical investigations.In machine learning, it is an integral part of post-hoc explainability [41,15], where it helps us understand the degree to which a model relies on the available features.This understanding has two final objectives [19,2].The first objective is dimensionality reduction, which involves screening out features that do not contribute to the model's predictions.The second objective is identifying the features that are most important for further modeling efforts or data collection by domain experts.Over the years, several feature importance measures have been developed to perform this task.On the one hand, in machine learning, a particularly important family is represented by Breiman's permutation importance measures [4].Breiman originally defines them based on the notion of mean decrease accuracy [2].The intuition is as follows: A given machine learning model (e.g., a random forest) is fitted to a feature-target dataset, yielding a given predictive accuracy.The values of a specific feature are then permuted to break its relation to the target.The predictive accuracy is then reassessed for this perturbed dataset.The difference between the new (possibly degraded) and the original accuracies provides us with an indication about the importance of the feature.However, [2, Proposition 2] show that there is no consensus on the exact mathematical formulation of the mean decrease accuracy and they prove that alternative software implementations yield different values.On the other hand, in statistics, a central role is played by measures of statistical association.Several indicators have been developed over the years: From the original Pearson linear correlation coefficient [47], to the new correlation coefficient of [8].In this family, a significant role is played by the so-called total order sensitivity effects [25]-total effects for short.Total effects are defined as the difference between the variance of the target and the portion that remains after all features have been fixed with the exception of the feature of interest.[25] show that, when features are independent, the total effect of a given feature equals the sum of all terms in the ANOVA expansion associated with that feature.Also, we can calculate total effects as the expectation of the squared difference of the values of the model output in two points that differ only in the value of the feature of interest.The new point can be obtained by a simple permutation of the values of the features in the dataset.This formulation is known as the Jansen's estimation method [30,31] and has inspired the so-called pick-and-freeze designs.In the class of pick-andfreeze estimators the one proposed by [29] is proven to be asymptotically efficient.[22] show that even in the case of dependent features, total effects retain an interpretation from a relative error perspective.Under a squared loss function, a total index is the expected loss increase for approximating the input-output mapping with a function that does not contain the terms associated with the feature of interest.Also, [2] show that total effects are closely related to Breiman's mean decrease accuracy.In particular, [2, Proposition 2] show that the different software implementations do not converge to the total effects but to a quantity whose bias increases with dependence, and is potentially amplified by interactions.They propose corrections so that the calculation of Breiman's mean decrease accuracy indeed converges to a total effect in the case the machine learning model is a random forest.The presence of statistical dependence complicates the calculation of total effects (we refer to [9,Ch. 5] for a thorough account).First, the interpretation in terms of the correspondence with the sum of terms in the ANOVA decomposition is lost.Second, also the possibility to use a Jansen-type estimator is not straightforward.In fact, while under independence the new points can be obtained with unrestricted permutations, the presence of dependences challenges such procedure.The problem is similar to the one signalled by [26] in the machine learning literature: unrestricted permutations may make the new points fall in regions that are far from where the data lie, forcing the machine learning model to extrapolate.These difficulties are compounded when features are not only dependent but also constrained on non-Cartesian supports.Constraints arise in applications when physical or business reasons require features to be located in certain regions.Here, an unrestricted permutation could lead to a feature that falls outside a given constraint, making the evaluation of the model not only at risk of extrapolation, but also meaningless.Furthermore, if constraints give rise to disconnected feature domains, they make the functional ANOVA expansion ill-defined [46].[34] propose a numerical approach based on the combination of rejection sampling and quadrature for the calculation of variance-based indices, with focus on numerical aspects.However, a statistical analysis of possible estimators with constrained inputs is missing.Our goal is to address the estimation of total effects with both dependent and constrained features, considering numerical as well as theoretical aspects.We proceed as follows.First, we extend the estimator of [31] to the case of dependent inputs.We show that it is still possible to estimate total effects under input dependence using a Jansen-like approach if the new value of the feature is obtained under conditional independence.We then propose a generalized winding stairs design based on the Knothe-Rosenblatt transform that can be used in association with a vast family of input dependencies.However, while this design conceptually pushes the boundaries of available methods for dependent inputs, it becomes impractical when inputs are constrained.We then introduce a new estimator of total indices by applying a weighting factor (called density quotient) to the extended Jansen's estimator.We show that the density quotient can be reinterpreted as a block-copula density, that vanishes when inputs are outside the constraints and that becomes unity when inputs are independent.We then formulate a U -statistic version of the estimator and obtain a central limit theorem.However, the new U -statistic estimator turns out to be computationally expensive as it requires the evaluation of the model at n 2 points.We then propose an alternative estimator based on a single permutation that reduces computational burden.We consider first the simplest estimator with the permutation given by a one-shift in the coordinate of interest and prove a central limit theorem of this estimator.We then extend the result for general derangements in the coordinate of interest.To further abate computational burden, we also introduce a nearest-neighbour estimator that makes the estimation cost independent of the number of features.We derive analytical expressions for the estimators in the case of linear models and Gaussian inputs.We then challenge the estimators on test cases of increasing complexity, starting with Cartesian domains with dependent features, and moving to connected non-Cartesian domains (e.g., circle and simplex) to disconnected non-Cartesian domains (such as non-overlapping triangles and Sierpinski gaskets), and conclude with application to a realistic simulator, the flood example of [10].

Total Effects: Bridging Old and New
In this section, we review total effects from a fresh perspective.We discuss the covariance representation of total effects and establish a link with an early result by Fréchet.We underline the role of conditional independence in the estimation of total effects with independent as well as dependent inputs.The analysis allows us to propose a new estimator for total effects with dependent inputs based on winding stairs and the Knothe-Rosenblatt transformation.In Section 2.2, we highlight the roles of conditional independence for such a representation.In Section 2.3, we propose a new estimator based on winding stairs and on the Knothe-Rosenblatt transformation for the case in which input dependence can be expressed via a Gaussian copula.

A Fréchet Perspective
Let us consider a reference probability space (Ω, B(Ω), pr), where B(Ω) is a Borel σalgebra.Let also X, Y be random variables on (Ω, B(Ω), pr), with supports X, Y.We let X = (X 1 , X 2 , . . ., X d ) be a d-dimensional random vector in R d , so that X ⊆ R d and consider a univariate Y , with Y ⊆ R. For the moment, we make the further assumption that the support of X is Cartesian, that is We denote the cumulative distribution function and probability density functions of X and Y by F X (x), f X (x) and F Y (y), f Y (y), respectively.For notation simplicity, we regard X and Y as continuous in the remainder.We suppose that to the |u|-dimensional vector whose components are indexed by u, and x −u the (d − |u|)-dimensional vector whose components are indexed by the complement of u, −u = [d] \ u.For a single factor i, we have u = i and we write the all-but-one set [d] \ {i} as −i.The total effect of X u is defined as [25] The literature has also introduced the normalized total effect as Using an argument of [16], we find the following useful equalities.

Lemma 1.
Let Y be a replicate of Y conditionally independent on X −u , i.e., Y and Y have same distribution and satisfy pr(Y Proof.[16] shows that Then, we obtain the covariance representation of τ u : Now, let Y be an independent replicate of Y conditionally on so that The second equality can be interpreted in terms of Jansen's equality for total effects [30,31] of which it is a generalization, as it does not require feature independence. In simulation and machine learning Y is often a function of X, Y = g(X), g : X → R. Suppose that g is square integrable, and that it can be decomposed as with 2 [d] the power set of [d], Under input independence, we can expand V[Y ] via the well-known functional ANOVA decomposition [14,55,43,58] with V[g u (X u )] the variance of g u (X u ) in (4).In [25,54], the total effect of input X j is defined as the sum of all terms in the right hand side of (5) that contain index j and it is shown that However, this identity does not hold if features are statistically dependent.Under dependence, τ u remains defined as in (1) and enjoys an interpretation in terms of the L 2 approximation error, as established in [22].In an argument similar to [51], [22] consider that the space L 2 can be decomposed into a direct sum L 2 (X) = M −u ⊕ M ⊥ −u where M −u contains all L 2 functions which solely depend on x −u and M ⊥ −u is its orthogonal complement.In general, for dependent features, If we ask the question of how accurately g(x) − g 0 can be approximated without the features in x u , then the answer is If we consider the L 2 norm weighted with the density of X, then we regain (1) from ( 7), as then g Conditional independence plays a central role in deriving (2): it is this property that enables one to replace E[Y |X u ] by Y in (2).We show that it also plays a central role in estimating τ u under input dependence via winding stairs and pick-and-freeze designs.

Conditional Independence and Total Effect Estimation
The gold standard for obtaining estimates for total effects under input independence is the Sobol' method, i.e., a pick-and-freeze design paired with Jansen's estimator [31].
Let X, X be d-dimensional input vectors.For u ⊆ [d], we use the notation X u : X −u to denote the d-dimensional vector whose components indexed by u are taken from X and whose components indexed by −u are taken from X. Now, let X u be a replicate of X u , independent of X u conditionally on X −u .Then, (X u : X −u ) and X are identically distributed and Y = g(X) and Y = g(X u : X −u ) are identically distributed and conditionally independent given X −u .The second equality in Equation ( 2) can then be rewritten as By Lemma 1, Equation ( 8) is true even under feature dependence.However, independence makes the design of an estimator for τ u straightforward.One generates two independent samples of size n from the input distribution.Let us denote them by X A = (X A,i ) i=1,...,n and X B = (X B,i ) i=1,...,n .The columns of these sample matrix blocks are recombined, copying input realizations for factor(s) j ∈ u from the second sample (B) into the first sample (A) to form pick-and-freeze input sample blocks X B u : X A −u .The model is then evaluated to obtain the output samples Y A = g(X A ) and Combining them via Jansen's equality, we obtain the estimator After the introduction of this design in [55], [25], works such as [53,56,18,17,50] have developed it further refining several aspects.Most of these works rely on the independence assumption, while we remove it in the remainder of this section.
Let Z be a general m-dimensional random vector that takes its values on a Cartesian product space.Again for notation simplicity, we assume that the joint probability distribution of Z has a density function h(z) with respect to Lebesgue measure.For u, v two disjoint subsets of [m] = {1, 2, . . ., m}, we denote the disjoint union of u and v by u + v.
Multiplying (10) by h w (z w ) we find ) Thus, under conditional independence the joint density factors into the product of two terms that one can symmetrically write as per the second identity in (11).As a direct consequence for the Jansen's estimator, when considering the joint distribution of X u , X u and X −u , we obtain the following result.Proposition 3. Let X be a replicate of X, conditionally independent given Written in density terms, we find two interchangeable representations of the total effect, The second term in (12) is the numerator of Equation (2.11) in [35, p. 939].Equation ( 12) is an essential ingredient for the estimation of total Sobol' indices under feature dependence.In the next section, we exploit Equation ( 12) to create a generalized design.
In Section 3, we use it for the definition of total effect estimators in the presence of constrained (i.e., non-Cartesian) input domains.

Winding stairs for Dependent Inputs with Gaussian Copula
We propose a new estimation strategy that combines Proposition 3 with the Knothe-Rosenblatt transformation [33,52].This transformation is discussed in association with the estimation of variance-based sensitivity indices with dependent features in the works of [38,39,36].The intuition is to move from the dependent features X to a set of independent features U uniformly distributed in the unit hypercube.Then, the U features are independent and one can apply the theory and algorithms of the functional ANOVA expansion under independence.Formally, the Knothe-Rosenblatt transformation is Hence after applying the transformation, one can calculate global sensitivity indices on the independent coordinates in U .However, after transformation the physical meaning of the original features may be lost and it might be difficult to transfer the results back to the original scale.Furthermore, the ranking is dependent on the order with which the features enter the transformation.
We propose an intuition to use the Knothe-Rosenblatt transformation for the calculation of total indices that avoids the rank dependence on the feature ordering and allows us to remain within the original feature space.The key is to combine these two facts.The first is that, by Proposition 3, the total effects of individual features, τ j , are associated with the conditional density f j|−j .The second is that this coincides with the density of the last term in (13).Then, if this last term is available from the transformation, we can simply draw realizations of an independent standard uniform random variable and apply the inverse transformation t j : u → x j = F −1 j|−j (u|X −j = x −j ).This transformation is, indeed, the inverse of the last term in (13) (up to a re-oredering of input variables): we have an X j which is conditional on all the remaining features.We exploit this fact to introduce a generalized winding stairs total effect estimator for the case of dependent features.The term winding stairs originates with [30].We refer to [6], [45] and [21] for further reviews.Assume that X (0) is a random copy of X, and U is a random vector of d independent standard uniformly distributed random variables, independent of X (0) .In the classical winding stairs design, under independence, the j th column in the feature sample matrix is replaced by an independent copy of X j .Under dependence, using Lemma 1, we can cyclically replace the jth entry in the input vector with a conditionally independent one.An appropriate way to obtain this conditionally independent sample is a Knothe-Rosenblatt transformation of the form: In the j th step, the j th component of X (0) is altered via for , j = 1, 2, . . ., d.When sampling, we obtain blocks of the type ) is a replicate of Y (j−1) conditionally independent given X −j .From Lemma 1, one obtains a winding stairs total effect formula The associated estimator is a variant of (9).As observed in [21], such an estimator is a sample average, so that the sample variance can be used to approximate the empirical variance.
If we model input dependence via Gaussian copulas, the transformations are linear in standard normal coordinates.For this, let Ψ j , j = 1, . . ., d be transformations from the marginal into the standard normal distribution and let Z j be a standard normal random variable independent of X.Then there exist linear combinations such that the random vectors are identically N(0, Σ) distributed and conditionally independent given X −j .These linear combinations can be extracted from a Cholesky decomposition of a reordered covariance matrix where the j th row/column is moved to the last position (we refer to the proof of Theorem 14 for the computation of the coefficients γ (j) in the linear combination in ( 16)).Hence ( 14) specializes to where Φ is the standard normal cumulative distribution function.From a numerical viewpoint, the computational cost associated with the winding stairs approach is n(d+1) model evaluations.This cost is explained as follows: one samples n random values of X and then considers one-at-a-time variations in each input for each of the n values.This design generalizes a winding stairs approach to dependence structures that include the broad family of Gaussian copulas.However, for cases in which the Knothe-Rosenblatt transformation is not available, the generalized winding stairs design becomes impractical.This happens as soon as features do not leave on a support which is Cartesian.
We then introduce alternative approaches that allow to generalize Lemma 1 to more complex dependence structures in the next sections.

Total Effects under Feature Dependence via Reweighting
Our purpose in this section is to introduce an estimation strategy that allows us to relax the traditional condition of a Cartesian domain, that is, we allow for X We assume that the features are distributed with a joint density f such that In order to use an estimation strategy with a classical pick-and-freeze design, we start with the following definition.Definition 4. Let X be an independent copy of X.We call the function the density quotient of X on X for the feature list u.
By Proposition 3, the density quotient in (18) satisfies To illustrate, for a Cartesian domain and independent inputs, we have ι u (X , X) = 1.Also, we have a compact expression for the case in which the dependence among two features can be expressed via a Gaussian copula.
Example 5.Under a bivariate Gaussian copula, the density quotient for pairwise dependence can be obtained in a compact form as follows.Let X i and X j be two random variables with a rank-correlation of .Setting u i = F i (x i ) and u j = F j (x j ), [32, Section 4.3.1]derives the density of the bivariate Gaussian copula as where one uses the transformation . The right hand side in (19) is the density quotient for a bivariate Gaussian copula.Proposition 6.Let X and X be i.i.d.random vectors.The following equality holds: Given n independent copies X i of X, i = 1, 2, . . ., n, then an unbiased estimator of τ u is Proof.Consider an independent copy X of X. Projections onto index subsets u and −u keep independence intact, i.e., X u = P u (X ) and X −u = P −u (X) are independent.The random vector obtained by glueing these two vectors together therefore has a density f u • f −u , breaking the inter-block dependence.Hence generally for a measurable function h : R d → R, we have Now, in order to compare the expectation of h(X) with X possibly being dependent, we split the argument X into two arguments via projections onto subdimensions indexed by u and −u, so that we may write the joint density as product of marginal and conditional density, Considering all three terms in a function h 2 : R |u| × R d−|u| × R |u| → R and taking its expectation then leads to Let us now consider the weighted Jansen's estimator, Then by (12), The last equality follows from (12).By definition, a sample consists of realizing n independent copies of X. Hence two different copies of X, X i and X j , i, j = 1, . . ., n, i = j, are independent.Then, an estimator of τ u is which yields (21).
Equation ( 21) combines a brute-force double-loop sample design together with a Jansen's estimator modified by a weight from importance sampling.First, it allows for a pickand-freeze sample design.However, a pick-and-freeze sample only provides us with a product of marginal distributions.The density quotient in Proposition 6 introduces a correction factor that allows one to switch from the product of the marginals back to the (correct) joint distribution.Moreover, the density quotient allows us to consider non-Cartesian input domains, as it vanishes for points outside the region where the inputs are defined.If features are independent we regain Jansen's classical estimator, because the density quotient is identically equal to one.
Lemma 7. The mix-and-reweight estimator (21) of Proposition 6 is a U -statistic of order 2.
Proof.Let us define the random vector W = (W 1 , W 2 ) = (X u , X −u ) (the random vector X split according to the index set u).Let W i , i = 1, . . ., n, be identical copies of W .
We then write the estimator as follows: Hence, τ U u,n defines a U -statistic of order 2 for We call the design associated with Proposition 6 mix-and-reweight approach.It is associated with a computational cost of dn 2 − nd + n evaluations (Table 1, third row).
From the general theory of U -statistics, and drawing from the classical findings of [24] it is possible to obtain the variance and derive a central limit theorem for the mix-andreweight estimator.
The asymptotic variance in Lemma 8 can be estimated with the plug-in Jackknife estimator of [23] (see also [3, p. 106]), defined as As an alternative, a bootstrap distribution of the estimator τ U u,n can be derived from the sample of Φ s (X i u , X j −u ).We find earlier accounts on the use of reweighting techniques in sensitivity analysis.Let us mention the estimation of first-order and total Sobol' indices in [57] in which the already available sample of simulations is reweighted with a sample weighting scheme based on importance sampling, but with the roles of the target distribution and the sampling distribution reversed.Also in [1, Section 5.4], the authors discuss reweighting and rejection techniques to measure the potential impact of small changes in the input probability distribution on the output mean.In [34], a rejection technique to handle non-Cartesian input domains is implemented.

Derangement and Shift Estimators
The mix-and-reweight estimator of Lemma 8 possesses the clear theoretical advantages associated with the notion of U -statistics.However, the associated estimation cost may turn into a notable disadvantage in practical applications.To reduce the cost for estimating τ u under input constraints, we propose two new estimators based on derangements and shifts.The intuition here is to compare a given realization to the next one (or via a random pick) instead of comparing it against all other realizations.This makes the costs drop from being quadratic in the sample size to being linear.Having fixed a sample block of size n, we introduce the cyclic shift-by-one of {1, . . ., n} defined by s n (i) = i + 1 for i < n, and s n (n) = 1.We also define the acyclic shift-by-one by s a n−1 (i) = i + 1 for i < n − 1, and s a n−1 (n − 1) = n.Here, s a n−1 (•) is a fixpoint-free map from {1, . . ., n − 1} to {2, . . ., n}.We have the following result.Theorem 9. Let X be a sample of size n subject to the joint density f .Then the shift-and-reweight total effect estimator for factor j defined as Then τ S j,n defined by (28) satisfies the following central limit theorem: Proof.First, the ith realization and the s n (i)th one in the input sample are independent.We thus deduce from ( 22) that for any 1 thus the estimator τj,n is unbiased.To prove the central limit theorem, we first decompose τ S j,n as: where τj,n−1 stands for the estimator built from Formula ( 28) on (X 1 , . . ., X n−1 ) and by replacing the cyclic shift-by-one s n by the acyclic one is stationary and 1-dependent.Thus, the limit Theorem 5].Finally, noting that and applying Slutsky's Theorem we get (29).
Corollary 10.Let X be a sample of size n subject to the joint density f .Define the shift-and-reweight normalized total effect estimator for factor j by T j < +∞, with σ 2 j defined in Theorem 9. Then we have: where Proof.The result follows from Theorem 9 and by applying the Delta method (see, e.g., [59]).First, by mimicking the proof of Theorem 9, it is possible to prove that for any α, β, γ, a central limit theorem holds true for α τ S j,n +β ) and Then we can prove the central limit theorem for T j , using the Delta method on ψ(x, y, z) ).More precisely, let ρ j denote the gradient of ψ at θ j .
We have It concludes the proof of Corollary 10.
It is possible to generalize Theorem 9 by dealing with more general permutations than the cyclic shift-by-one, as stated in Theorem 11 below.
Theorem 11.Let X be a sample of size n subject to the joint density f .Let (π n ) n≥1 be a sequence of derangements (fixpoint-free permutations) of {1, . . ., n}.Then the derangeand-reweight total effect estimator for factor j defined as is unbiased.Suppose that there exists δ > 0 such that E |V 1 j | 2+δ < +∞ with V 1 j defined as in Theorem 9 and lim n→+∞ m 1+δ n n − δ /2 → 0, with m n the number of cycles of π n .Then we have the following central limit theorem: where The assumption lim n→+∞ m 1+δ n n − δ /2 → 0 does not seem too technical.Indeed, a permutation π n of {1, . . ., n} decomposes into cycles and a classical result in combinatorics lets us expect 1+ 1  2 +• • •+ 1 n cycles per permutation, and this harmonic series is approximately log(n).
Proof.To prove that τ D j,n is unbiased, we use the same arguments as the ones used to prove that τ S j,n defined by (28) in Theorem 9 is unbiased, additionally noting that π n (i) = i for all i = 1, . . ., n.To prove the central limit theorem, we first decompose, for each n, the permutation π n in cycles C 1,n , . . ., C mn,n .Let us arbitrarily fix the first element in each cycle.We then form p n blocks, with p n = max 1≤k≤mn k,n and k,n the length of cycle C k,n .For each n, we re-order the X i s so that the first b 1,n = m n re-ordered variables are the first element of each cycle, the next b 2,n re-ordered variables are the second element of each cycle with length at least two and so on until the n − pn−1 v=1 b v,n last b pn,n re-ordered variables which are the last element in each cycle of length p n .Here, 1 ≤ b pn,n ≤ . . .≤ b 1,n = m n .For each n, we denote the re-ordered sequence of √ n but with the X i,n s in place of the X i s.More precisely, we have to use the trick in the proof of Theorem 9 by replacing first the last variable X i in cycle C 1,n by X n+1 , . . ., the last variable X i in cycle C mn,n by X n+mn .As lim n→+∞ m n / √ n = 0, we prove that the remaining term decreases fast enough not to perturb the result of the central limit theorem (see the proof of Theorem 9 for more details).Now, as lim n→+∞ m n / √ n = 0, then p n ≥ n/m n ≥ √ n/m n → +∞.For each n ≥ 1, the sequence (S v,n ) 1≤v≤pn is a sequence of 1-dependent variables.Then applying [42, Theorem 2.1], we get the central limit theorem, as soon as there exists δ > 0 such that E[|V 1 j | 2+δ ] < +∞ for some δ > 0. Indeed, as variables inside each block are centered, independent and identically distributed, Then, Assumption (2.1) in [42, Theorem 2.1] is true due to the stationarity of V i j V i+1 j .Indeed, due to 1-dependence, Let us now prove that Assumption (2.2) of [42, Theorem 2.1] holds.For any ε > 0 we have: This concludes the proof of Theorem 11.
Remark 12.It is possible to compute confidence intervals by block-bootstrapping.It is important to implement bootstrapping with blocks, in order to preserve the 1-dependence structure.

Nearest-Neighbour Estimation
Thus far, we have introduced and discussed designs for obtaining total effects that generalize Jansen's intuition.They are listed in rows 2 to 5 of Table 1.The costs of these estimators are an upper bound: Whenever the density quotient vanishes, the model output does not contribute to the estimation, so there is no need to evaluate the model at coordinates which have a density quotient of zero and we save computational time.However, these designs require a simulator in the loop as the X points need to be reevaluated through the model.Also, they depend on the number of features (d) and thus can be exposed to the curse of dimensionality.However, for the same n a design that does not depend on d is nominally the computationally cheapest.We study a nearest-neighbour approach that achieves such a d−independent cost.The construction is as follows.Consider two observations x and x , and consider the recombined point (x j : x −j ).The first step is the evaluation of the density quotient ι j (x , x) at this point.If the density quotient is non-negligible, then select the point which is closest to (x j : x −j ) with respect to some predefined Euclidean metric from the available data.More precisely, the point is the solution of Then, use the corresponding value of Y as a proxy for g(x j : x −j ) in (21).Otherwise, if the quotient is small, we set the contribution of ι j (x , x)×(g(x j : x j )−g(x)) 2 equal to zero.This step can be implemented by setting a threshold on the value of the density quotient and considering as negligible all values of the density quotient below the threshold.All the required information for applying this design is contained in a sample generated by a once-through pass of a Monte Carlo simulation.The nearest-neighbour approach serves here as a metamodel, predicting model outputs for the mixed input sample.This is a different use of the nearest-neighbour intuition than in [5,49], where nearest neighbours are used to select a conditional stratum (see [11,12] for theoretical results).At this stage, also due the presence of the density quotient threshold, we do not furnish any theoretical results for our estimator based on (32).We will then evaluate this strategy based on empirical comparisons in a series of experiments in which we compare the performance of this design with the other estimators in Table 1.

The Link Between Total Indices and Breiman's Permutation Importance with Feature Constraints
There is a close link between MDA j and τ j visible in [26, Theorem 2] and discussed in great detail in [2].In this section, we extend the relationship to the case in which inputs are constrained.Consider the problem of training an input-output mapping of the type h(X, θ), h : X × R q → R, where θ is a q-dimensional vector of auxiliary parameters.Let L(Y, g(X; θ)), L : R × R a loss function.The training problem can be defined as finding We let E[L(Y, g(X; θ * ))] denote the nominal (and minimal) expected loss function for the machine learning problem.Then, Breiman's importance of feature X j is defined as where E[L(Y, g(X j : X −j ; θ * ))] is the expected loss that we incur if feature X j is permuted.The intuition is that if the machine learning model relies heavily on X j for its predictions, then the loss in predictive accuracy should be high and consequently the difference between the nominal loos and the loss after permutation should be significant.
After feature permutation we expect a decrease in model prediction accuracy, so that the expected loss after feature permutation is larger than the nominal expected loss, yielding MDA j ≥ 0.
Of relevance to us are Equations (3.1) and (6.2) of [15].[15] formulate Breiman's importance in terms of model reliance, as a ratio between the expected loss after permutation over the expected loss before permutation.Using our notation, their definition of model reliance in their Equation (3.1) (p.8) would read Rewritten in MDA terms, (35) yields Proposition 13.Consider MDA j in (36).If L(Y, g(X)) is a squared loss function and g(X; θ * ) is a perfect predictor, then MDA j = τ j .

Analytical Total Effects for Linear Models with Gaussian Features
To provide analytical examples, we derive analytical expressions for the total effects to be used as benchmarks in numerical experiments.This discussion also gives us a formula for the density quotient in the case of Gaussian copulas Let us consider a linear mapping between Y and X, Y = β 0 + β T X, X ∈ R d , β 0 ∈ R, β ∈ R d and let the features be normally distributed with mean µ and variancecovariance matrix Σ.Under this assumption all conditional distributions are Gaussian and all conditional expectations are linear.The pair (X, Y ) is then also Gaussian with augmented covariance matrix, Σ = Σ Σβ The correlation matrix in (38) is given in form of a Schur complement.It is assumed that the submatrix selected by w is invertible.For a multivariate Gaussian distribution, the statement u ⊥ ⊥ v|w holds if and only if Γ u,v = Γ u,w Γ −1 w,w Γ w,v holds, as in this case the correlation matrix in ( 38) is block-diagonal, i.e. .
Theorem 14.Given Y = β 0 + β T X, X ∼ N(µ, Σ), X ∈ R d , the unnormalized main and total effects are given by The output variance is Alternative computations are offered in [38] for the case d = 3.These results can also be retrieved from the proof of [46,Theorem 4.1].
Proof.Main effect: We consider the Gaussian multivariate distribution of (X, Y ).Set- using here the special structure of the augmented matrix.This variance is constant (as it does not depend on the value of X j ), so that However, for main effects we are interested in the variance of the conditional expectation, so we have to subtract the value in (41) from the total variance which yields (39).Total effect: The Rosenblatt transform in the Gaussian case uses the Cholesky decomposition of the covariance matrix.The Cholesky matrix C = chol(Σ) is an upper triangular matrix such that where the index −1 denotes all coordinates but the first (if Σ is a scalar then chol(Σ) = Σ 1 /2 ).By reordering the input factors, we may assume without loss of generality that We are left with the identification of the last diagonal entry of the Cholesky matrix.Because of its hierarchical triangular structure, it keeps subdeterminants intact, and Theorem 14 provides analytical formulas for main and total effects for linear models with Gaussian features.When we want to estimate total effects under a linear model with Gaussian features with the reweighting approaches introduced above, we require the density quotient.For multivariate Gaussian input distributions, it reads where (• : •) is the out-of-order composition of block diagonal matrices.This readily generalizes to Gaussian copulas.In the bivariate case, setting µ = 0 and Σ = 1 1 in (42) therefore yields (19).

Experiments
In this section we study the numerical implementation of the designs discussed in this work, using various examples which focus on different aspects.Subsection 8.1 exploits the analytical results obtained for Gaussian linear models to compare numerical estimates and analytical results for showing asymptotic consistency.Subsection 8.2 uses the Ishigami function to compare the reweighting estimators in greater detail.Subsection 8.3 considers the case in which inputs are constrained to lie in a circle.In this case, the winding stairs estimator cannot be used anymore and the section concerns the comparison of nearest-neighbours and double-loop approaches.Subsection 8.4 addresses the case in which inputs are constrained on a non-connected domain (two separate triangles).Subsection 8.5 reports results for the case in which inputs are constrained on a simplex, based on the case study of [20].Subsection 8.6 reports results in which inputs are constrained on complex and non-connected domains which can be modeled as Sierpinski gaskets.Subsection 8.7 reports results for a realistic simulator.All experiments are run on personal computer with an Intel(R) i7-3770 CPU at 3.40GHz and 8GB RAM, using MatLab R2022a.We rely on the MatLab k-d-tree implementation for the nearest-neighbour search.In all the experiments we implemented the nearest-neighbour approach (see Section 5) with the low threshold equal to zero.In these experiments we compare the estimators in Table 1.As shown in the third column, the methods are associated with different computational costs.To make experiments comparable, we fix the overall budget of the experiment and then find back the block sample size n.To illustrate, in the case the budget is, say, B = 10000 model runs, for a d = 3 variable model we have sample block sizes respectively of n = 2500 for the generalized winding stairs and the shift(derange)-andreweight designs, and n = 58 for the U-statistic design, while B = n = 10000 for the nearest-neighbour design.

Linear Model with Normal and Correlated Inputs
We consider the linear model where the correlation matrix is Σ = This example is also discussed in [35].We also test the effect of using alternative sample generators, namely, crude Monte Carlo (MC) and randomized Quasi-Monte Carlo (RQMC).For the randomization we employ the MatLab Sobol' sequence scrambler, following [44,40].Applying Theorem 14, for this model we find the values of the total effects analytically.They can then be used to study convergence and uncertainty in the estimates.In our parametrization, we set σ = 2 and let vary between [−1, 1].For the numerical experiments, we hypothesize a computational budget restriction, with a fixed budget B. Fixed budgets are of interest for computationally expensive models, where one faces a trade-off between statistical accuracy and the needs to keep computational time under control.Given the designs in Table 1, once the total budget is fixed, one derives as previously explained the corresponding basic sample sizes n.We set B = 1680, which, with d = 3 input factors, leads to n = 24 for the U -statistics, n = 420 for the winding stairs and reweighting approaches, and n = 1680 for the nearest-neighbour approach.Figure 1 shows the results for alternative values of (that is, we repeat the calculations with the fixed buget B for ∈ {−0.9, −0.75, −0.6, . . ., 0.6, 0.75, 0.9}).Each graph reports the values of on the horizontal axis.On the vertical axis, the corresponding analytical values of τ i (i = 1, 2, . . ., 3) are displayed using continuous lines, and the estimates by dashed lines.Confidence intervals are displayed as shaded areas.For the U -statistics approach, we use the asymptotic normality result of Lemma 8 together with plug-in estimates for the estimator variance.The results for the winding stairs estimate follow from [21].In the shift-and-reweight approach (panels C and D) and in the derange-and-reweight approach (panels G and H) we use the upper and lower 2.5% quantiles from a block-bootstrap.In our further tests, a normal approximation using the estimator variance from the asymptotic results of Theorem 9 or Theorem 11 offers comparable intervals.Figure 1 shows that the point estimates from QMC designs (using scrambled sequences) generally perform better than those from a plain Monte-Carlo design, while the confidence bounds are comparable.Preliminary tests showed that the shift-and-reweight approach is not working well with QMC design.Instead, we replaced the shift with a permutation, and a derange-and-reweight approach has been used for panels G and H. Considering the mean squared error for different simulations one can conclude that for this example, a QMC design is an advantage for U -statistics and winding stairs methods, while the impact is not so clear on the derange-and-reweight methods (both with reevaluations and with nearest-neighbour approximations of the mixed sample).
In this example, investing the computational budget into one large sample and using a nearest-neighbour metamodeling approach seems to offer a good performance, only to be beaten by winding-stairs QMC design that, as already remarked, is not necessarily available for general dependence in form of constraints of the input features (these visual findings are corroborated by corresponding mean squared errors, see Table 2).

The Ishigami Function with Correlations under a Gaussian Copula
We further test the proposed designs on a well known test case, the Ishigami function with correlated inputs.This test case has been used in previous studies on total effects with dependent features in works such as [35].We also use these experiments to continue our investigation about the effects of the random sample generator on the estimators.The input-output mapping is Y = g(x 1 , x 2 , x 3 ) = sin(x 1 )+7 sin 2 (x 2 )+0.1x 4 3 sin(x 1 ) with  [35,Section 7.3], we introduce a statistical dependence between X 1 and X 3 by prescribing a pairwise Gaussian copula with a varying correlation coefficient.For the winding-stairs approach we use the inverse Knothe-Rosenblatt transformation detailed in Example 5.For the reweight strategies, we implement the rank correlation using (17).We let the correlation coefficient ρ(X 1 , X 3 ) range from ρ(X 1 , X 3 ) = −0.9 to ρ(X 1 , X 3 ) = 0.9.While there is no closed form solution for the total indices at a generic value of ρ(X 1 , X 3 ), a reference value is available at ρ(X 1 , X 3 ) = 0, for which the total indices are analytically known, with values T 1 = 0.56, T 2 = 0.44 and T 3 = 0.24, respectively.We fix a budget of about B = 8200 simulations.This yields a basic sample size of n = 2048 for the generalized winding stairs and weight and derange approach, of n = 53 for the U -statistic and n = 8192 points for the nearest-neighbour estimator.We generate the sample first with crude MC and then with QMC. Figure 2 shows the estimates of the normalized total effects T 1 , T 2 , T 3 as a function of the correlation between X 1 and X 3 .The upper row displays estimates when crude MC generation is used, while the lower row displays estimates when QMC generation is used.When using crude MC, the winding-stairs, derange-and-reweight as well as the nearest neighbout designs perform comparably.At ρ(X 1 , X 3 ) = 0, estimates for these designs are close to the analytical values.When using QMC, the curves of the winding-stairs and U -statistic estimators exhibit increased regularity, while methods with a random derangement do not.This can be due to random derangements breaking the properties of low discrepancy sequences.Despite the greater regularity of the U -statistic estimates as a function of ρ(X 1 , X 3 ), its estimates are upward biased for T 1 and most notably for T 2 , compared to the other approaches.A reason may be that at n = 53, the QMC Sobol' sequence does not populate a Latin Hypercube and the projections on the marginals are not uniform.To fill in a Latin Hypercube, we would need to increase the basic sample size to the next power of 2, n = 64.However, these additional eleven points in the sample block would propagate into a new budget of B = 12160, a nearly 50% increase in computational cost.Lastly, the rightmost panels show that the nearest-neighbour estimator performs similarly with both sample generation methods.

Features Constrained on a Circle
While the previous examples featured dependence structures given by Gaussian copula correlation, this example is used to study a non-rectangular support.Let us consider the two-dimensional input-output mapping with X 1 and X 2 uniformly distributed within a circle of radius π centered at the origin.Figure 3 shows an input sample of size n = 1024 and the model output surface.As the output density can be computed analytically, calculation with a symbolic software (Mathcad Prime 8 in our case) yields unnormalized total effects τ 1 = τ 2 = 6.527.This geometry rules out the application of a design based on the Knothe-Rosenblatt transform for calculating total effects.In fact, we would need to find the quantile function corresponding to the marginal cdf of X 1 , and plug this into the conditional quantile function of Even for this simple geometry, this seems to be a tantalizing task.In contrast, in case of a constrained domain C the joint density is x → 1 C (x)f X (x) C f X (x)dx where we assume that the probability density without the constraint is f X .The marginal distributions can be obtained by integrating over the joint distribution.The probability of X ∈ C can be estimated from a rejection sample approach.In the case of an output constraint, D, in the context of target sensitivity, then the input constraint C becomes the inverse image of set D under g: C = {x : g(x) ∈ D} = g −1 [D].In the example discussed here, C is a circle of radius R and the density quotient to be used in our proposed methods is Equation ( 44) is symmetric in its arguments, hence it can be used for computing the sensitivity of both input factors.We use the experiments to investigate the rate of convergence of the derange-and-reweight approach of Theorem 9 when using different sampling strategies: plain Monte Carlo, Quasi Monte Carlo and randomized Quasi Monte Carlo.The mean squared errors which are averaged over all factors follow a O(n −1 ) convergence rate for all sampling strategies, as seen in Figure 4, where the diagonal of the dashed triangle evidences the n −1 convergence rate.The estimators therefore are of rate O(n − 1 /2 ), as for standard Monte Carlo estimation.We observe a similar convergence rate across the alternative generators.A reason may be that the shifting strategy interferes with the regularity of the QMC structure, thus reducing the advantage of using a QMC generator in this context.Furthermore, a constraint may introduce discontinuities which are not compatible with functions of bounded variation in the sense of Hardy and Krause and in this case the Koksma-Hlawka Theorem does not provide an improved convergence rate.

Features Constrained on Two Disconnected Triangles
In this section, we consider experiments for features constrained on a disconnected domain.We hypothesize that the model is g(x 1 , x 2 ) = (x 1 − 1) • (x 2 − 1) and the features lie in the 2-dimensional domain 5).The joint input density, f 12 , is 4 within the triangles and vanishes outside the triangles.With a symbolic calculator, one can compute analytical formulas for the marginal densities f 1 and f 2 by integrating f 12 .In this case, also the sensitivity indices can be computed analytically.Theoretical computations yield (normalized) total effects of T 1 = 0.037 and T 2 = 0.27 (the total variance is 0.117).We will now test these values against a numerical implementation.The sampling in this case is performed with rejection, starting from a uniform distribution on the square.The procedure generates dependence between X 1 and X 2 .Figure 5 shows a sample of realizations consistent with this domain.We run experiments to test the proposed reweighting methods.Table 3 shows the results for the shift-and-reweight method of Theorem 9. Due to the use of rejection sampling and because mixture realisations which fall out the constraints are not evaluated, the sample sizes do not form a regular progression.The right part of the Table reports the results when processing the basic sample size generated before with nearest neighbours, i.e., the number of model evaluations is reduced to the size of the basic sample block.One observes only minor differences between both approaches.The estimator variances of Table 3 are computed from a plug-in estimator (see the discussion following Lemma 8).

Features Constrained on a Simplex
We implement a dependence between the last and the last-but-one input by restricting the input to satisfy a simplex condition  d = 3 and g(x 1 , x 2 , x 3 ) = x 1 x 2 − x 3 one has analytical total effects given by T = (0.125, 0.125, 0.375).For d = 4 and g(x) = −x 1 + x 1 x 2 − x 1 x 2 x 3 + x 1 x 2 x 3 x 4 analytical values are given by T = (0.6300, 0.4861, 0.0064, 0.0064).A third variant uses the well-known g-function with d = 4 and parameter vector a = (0, 1, 3, 6), again using the simplex constraint condition on the third and fourth argument.Here the analytical values are T = (0.7659, 0.2357, 0.0522, 0.0172).These examples were introduced in [20] to test a new space-filling sampling strategy for estimating grouped Sobol' effects in the framework of constrained inputs (see [28]).We perform simulations using the shiftand-derange estimators of Theorem 9 with random shifts.The mean squared errors of all the examples (taking the average of 20 runs) are reported in Figure 7. Without preprocessing the QMC sample by permuting the order of realizations, the shift method fails.However, the use of (permuted) scrambled Sobol' sequences as input does not change the convergence rate which is consistent with the Monte Carlo approximation error.The nearest-neighbour approach presents even the best performance in the first two examples.

Features Constrained on Sierpinski Gaskets
In this section, we consider an experiment that starts with a Cartesian support and remove parts of the support, until we reach an almost fractal structure, an approximation of the Sierpinski gasket, that contains holes and is not star-shaped connected.Figure 8 demonstrates the regions where the two input features are constrained on.
For the model, we consider a two-dimensional input-output mapping of the form Y = X 1 + βX 2 .In the experiments we consider alternative values for the model-parameter β, namely {−2, −1, 0, 1, 2}.Although this is a family of linear models, it can be used to demonstrate the subtleties which occur when working with dependent input data.The first two domains are formed by cutting corners of square: a cut-one-corner-constraint consisting of all points on the [0, 1] with exclusion of the pairs (x 1 , x 2 ) such that x 1 +x 2 > 3 /2, an exclude-two-corners constraint, where additionally points satisfying x 1 + x 2 < 1 /2 are excluded.Then, we approximate a fractal structure, the Sierpinski gasket, by exclud- ing all realizations (x 1 , x 2 ) that satisfy the following three conditions for k = 1, 2, . . ., 5: • where x → mod(x, 2) = x − 2 x 2 denotes the rest after an integer division by 2. To use the reweighting approach with these constraints, we formulate the density quotient as composed of the indicator function of the acceptance region, divided by the product of the marginal densities.The required normalizing constant is related to the acceptance rate.It is possible to derive the expression of the density quotient analytically.For the cut-one-corner constraint the joint density is and the marginal densities by For the two-corner design we have a joint density give by and marginal densities Here . The rejection rate is present in both the marginal distributions and the joint one, so that the inverse of the rejection rate is a multiplicative constant in the density quotient.For the Sierpinski gasket, the joint distribution is the product of indicator functions of the complements of the Sierpinski sets listed above.To proceed with numerical experiments, we use a rejection method.We generate data using crude Monte-Carlo sampling, and reject feature realizations that do not satisfy the constraints.For the cut-one-corner and cut-two-corners constraints, we start with a sample of size n = 10240 on the [0, 1] 2 full Cartesian domain.Following rejection of realizations falling outside the domain, for the cut-one-corner domain, we are left with n = 8938 realizations and for the cut-two-corners domain, we are left with n = 7690 realizations.These numbers are in line with the theoretical acceptance rates of 7 /8 and 3 /4, respectively.
For the Sierpinski gasket, we start with a base random sample size of n 0 = 50000.The first step in the rejection process retains half of the observations, while each further step retains 3 /4 of the observations.Hence after five iterations we reach a sampling size of After rejection, we are left with n = 7932 observations.Figure 9 shows the empirical marginal densities of the features.Iteration k = 1 removes 1 /2, the further iterations k > 1 remove each 1 /4 of the mass.One needs to rescale these curves to obtain the marginal probability densities for both factors which are then piecewise linearly defined.The two-corner design and the Sierpinski gasket both satisfy cov(X 1 , X 2 ) = −0.5 and have roughly the same input variances.In a linear model, only the input variances and covariances enter into the computation of total effects, so that the estimates for the last two cases should be similar.Table 4 reports the results for the calculation of the total effects.The shift-and-reweight method of Theorem 9 was used.The nearestneighbour approach shows only small differences in the results.Table 4 shows that for the full design T i = 1 as to be expected for a linear model, however, the inequality 1 ≤ T i does not generally hold true under constraints.Note also the model behaviour of for the case β = 0.The total effect of X 2 correctly asserts that in this case the model behaviour can be explained without recurring to information from X 2 .However, the total effect of X 1 being less than one must then be due to input data dependence, and not by factor interaction in the model.The sum of totals is large if there are competing effects, i.e., when the negatively correlated features are fed into a simulation model with positive monotonicity (β > 0).

Application: the Flood Model of De Rocquigny (2006)
We study the flood model in [10], assuming the same dependence structure in the input features as in [7].The model calculates the maximum annual overflow, given eight input features (Table 5).
The correlation between the pair of features (1,2) is set  to 0.5, the correlation between (3,4) and (7,8) to 0.3 each, via Gaussian copula.The density quotients for each pair are given in (19).We calculate total indices with the derange-and-reweight approach of Theorem 9 and nearest neighbours, fixing a budget of B = 9, 000 model evaluations.The basic sample size is then n = 1000 Monte Carlo realisations.The correlation structure in the basic sample is implemented via the Iman-Conover method [27,37].Main effects are estimated from this basic sample using a discrete cosine transformation [48] with 8 harmonics.A jackknife is used to derive confidence bounds.For nearest neighbours, a Monte Carlo sample of size n = 9, 000 (same budget) is used.Because of the different scales in the inputs, we standardize the input sample using the empirical standard deviations as scaling factors.Figure 10 displays the results in the form of a barplot, reporting the main and total effects for each feature, as well as the error bands on top of the bars.The error bands for the main effects and for the shift/derangement approach are calculated as 1.96 times the square root of the plug-in variance estimates, using a normal approximation for a 5% confidence bound.Regarding the feature ranking, Feature 6 (H d ) is identified as the most important, followed by Features 1 (Q), 3 (Z v ), 2 (K s ) and 7 (C b ); the remaining features play a minor role.This result is in accordance with the findings in [7].The values of the main effects and total effects (estimated with the derange-andreweight approach) are close.Because Feature 6 is probabilistically independent of the remaining features, this equality signals that H d is not involved in relevant interactions.
In contrast, features 1 and 2 are noticeably different under main and total effects, with total effects larger than their main effects.Also, if ranked according to main effects, Feature 3 would rank second most important, switching place with Feature 1. Overall, the derange-and-reweight approach that evaluates the model at the mixed realizations shows comparable results to the nearest-neighbour approach.However, the nearestneighbour approach seems to exhibit a bias for the least important inputs.By the theoretical results of [12] for dimensions larger than four, the nearest-neighbour estimator bias dominates the estimator variance.One insight gained from this application is to use both a derangement and a nearest-neighbour approach, when possible, to confirm the estimates of both methods.This is because the former is less affected by bias as dimensionality increases.

Conclusion
Estimating total effects under feature dependence and constraints is a challenging task.
We have first proposed a generalized winding stairs approach that relies on a Knothe-Rosenblatt transformation for the case of dependent features.However, this estimator becomes impractical for general dependence structures imposed by feature constraints.We then offered new approaches for total effects under feature constraints with a formal theoretical investigation of convergence properties.The intuition is based on pairing Jansen's estimator with a reweighting factor.We have first proposed a U-statistic estimator, for which a central limit theorem is immediately derived from classical results.The estimator, however, turns out to be computationally expensive.We have then studied two alternatives based on shifts and derangements that abate computational burden.We have derived corresponding central limit theorems.The proposal of an estimator based on the nearest-neighbour technique has been studied.The behavior of the estimators has been tested on numerous experiments with feature constraints of increas-ing complexity.We have formally derived the link between total indices and Breiman's permutation feature importance measures under constraints.From a machine learning perspective, our estimators prevent the flaw of permute-andpredict methods identified by [26], not only for dependent features, but also in the more difficult setting of non-Cartesian domains: the density quotient, in fact, places a zero weight on points that violate the constraint after the permutation.

Figure 1 :
Figure 1: Total effects (unnormalized) for the linear model, including confidence bounds (gray areas): Winding stairs (panels A and E), U -statistics estimator (panels B and F), shift-and-reweight (panels C and D), derange-and-reweight (panels G and H).The analytical total effects are represented by thin lines.

Figure 4 :
Figure 4: Mean squared errors for the unnormalized total effects over both factors with alternative sample generators, depending on the sample size (number of model evaluations) using basic sample sizes from 2 8 to 2 20 .

2 Figure 5 :
Figure 5: An input domain in form of two separate triangles.

Figure 6 :
Figure 6: An input constraint in form of a simplex condition.

Figure 7 :
Figure 7: Convergence (MSE) of total effects for models with a simplex constraint in the inputs.

5 Figure 9 :
Figure 9: Sierpinski marginal density derivation: evolution of the marginal integral with rejections.

Figure 10 :
Figure 10: Sensitivity results for the flood model (error bars represent the 95% confidence band).
as both are identically distributed.The first equality in (2) substitutes the possibly high-dimensional nonlinear regression E[Y |X −u ] in (1) with a covariance operation.When replacing the regression surface by Y , the error term differing only in their last coordinate independent from each other.Then Y and Y are identically (but not independently) distributed, and we know from Section 2.2 that cov(Y

Table 1 :
Computational costs for the estimation of all d total effects with constrained inputs, block sample size n.

Table 2 :
Averaged mean squared errors (with respect to ) of the different methods and sampling designs for the Linear Model using 20 repetitions.

Table 3 :
A product model on a support of two separate triangles, relative total effects, shift-and-reweight method of Theorem 9.

Table 5 :
Flood model feature list.