Quasi-Monte Carlo methods for two-stage stochastic mixed-integer programs

We consider randomized QMC methods for approximating the expected recourse in two-stage stochastic optimization problems containing mixed-integer decisions in the second stage. It is known that the second-stage optimal value function is piecewise linear-quadratic with possible kinks and discontinuities at the boundaries of certain convex polyhedral sets. This structure is exploited to provide conditions implying that first and higher order terms of the integrand’s ANOVA decomposition (Math. Comp. 79 (2010), 953–966) have mixed weak first order partial derivatives. This leads to a good smooth approximation of the integrand and, hence, to good convergence rates of randomized QMC methods if the effective (superposition) dimension is low.


Introduction
Two-stage stochastic mixed-integer programs belong to the most complicated optimization problems due to multivariate integrals and discontinuous integrands (see [32,46]). Most approaches for their computational solution require first a numerical integration scheme for the multivariate integral and second an efficient solution method for the resulting specifically structured large scale mixed-integer program. For some time Monte Carlo methods appeared as the only convergent numerical integration technique for such optimization models [21]  Institute of Mathematics, Humboldt-University Berlin, Berlin, Germany are available for solving the discrete stochastic program efficiently. For the latter we refer to approaches based on combinations of decomposition, branch-and-bound and branch-and-cut (see [1,48] and the survey [47]).
The aim of the present paper is to contribute to the first computational step. Although Monte Carlo sampling methods are well established in theory and practice (see, for example, [5,12] and [49,Chapter 5]), they suffer from slow convergence rate O(n − 1 2 ). In recent years much progress has been achieved in the construction and analysis of Quasi-Monte Carlo (QMC) methods for computing integrals in high dimension d. We refer to the monograph [9], the survey [8] and the state-of-the-art [24] for presenting recent developments. For example, it is known that certain randomized QMC methods can achieve almost the optimal convergence rate O(n −1 ) if the integrands admit mixed weak first partial derivatives and, hence, belong to certain weighted tensor product Sobolev spaces on the unit cube [0, 1] d or on R d . We refer to the origins of randomized QMC methods in [39,40], a survey [28] and a short introduction [29,Section 2].
In the present paper we study the applicability of randomized QMC methods to two-stage stochastic mixed-integer programs. Integrands arising in such models are piecewise linear-quadratic and contain kinks and discontinuties along faces of convex polyhedral sets (see Sect. 2). Hence, they do not have mixed first derivatives in the classical or weak sense. However, many such integrands allow an approximate representation by a function which can be much smoother than the original integrand under certain conditions and by a nonsmooth remainder. The key here consists in a specific decomposition of the multivariate integrand with d variables into a sum of 2 d terms each depending on a group of variables indexed by a subset of {1, . . . , d}. Such decompositions depend on the choice of d commuting projections P k , k ∈ {1, . . . , d}. An important example is the analysis-of-variance (ANOVA) decomposition in which the projection P k integrates with respect to the kth variable (see Sect. 3). As first observed in [14][15][16] such ANOVA decompositions may gain smoothness due to the specific projections. Our results in Sect. 4 show that such smoothness properties hold indeed for low order ANOVA terms of the integrands in two-stage stochastic mixed-integer programming if a geometric condition is imposed on the faces of the convex polyhedral sets. Our main result in Sect. 4 (Theorem 1) states that truncated ANOVA decompositions of the integrands have mixed weak first derivatives and represent good approximations of the integrands if the marginal densities of the underlying probability distribution are sufficiently smooth and the effective (superposition) dimension (23) is low.
Thereby we extend our earlier work [29] for two-stage models without integer decisions substantially. In particular, we show that the ANOVA terms of linear two-stage integrands satisfy the relevant smoothness properties not only until order 2 (as asserted in the main result of [29]) but until any order less than d 2 . In addition, we extend the convergence analysis for randomly shifted lattice rules to such discontinuous integrands (in Sect. 5). Compared to [29] the proofs of our main results in Sects. 4 and 6 require new tools like a characterization of faces of projected polyhedra and the theory of Haar measures on the topological group of real orthogonal matrices. The latter is needed to show that for multivariate normal distributions the geometric condition is satisfied almost everywhere with respect to the Haar measure defined on the group of orthogonal matrices needed for transforming the covariance matrix to diagonal form.
In general the performance of randomized QMC methods may be significantly deteriorated for discontinuous integrands. In [17], for example, the authors derive convergence rates for functions of the form g(x)1l B (x), x ∈ [0, 1] d , where g is smooth and B is convex polyhedral. They show that the convergence rate is much lower than optimal, but it improves if some of the discontinuity faces of B are parallel to some coordinate axes (best case being all faces parallel to some coordinate axes). As noted earlier the integrands of two-stage stochastic mixed-integer programs have also discontinuities at the boundaries of convex polyhedral sets but their structure is unknown and hidden in the problem data.
Numerical experience on comparing Monte Carlo sampling, randomly scrambled Sobol' point sets and randomly shifted lattice rules for a two-stage stochastic mixed-integer electricity portfolio optimization problem is reported in detail in the accompanying paper [30]. In Sect. 7 we recall and discuss the computational results and add some conclusions.

Two-stage stochastic mixed-integer programs
Let us consider the two-stage stochastic mixed-integer program where is the infimum function of the second-stage mixed-integer linear program (u, t) := inf u 1 , y 1 + u 2 , y 2 : for all pairs (u, t) ∈ R m 1 +m 2 × R r , where c ∈ R m , X is a closed subset of R m , W 1 and W 2 are (r , m 1 ) and (r , m 2 )-matrices, respectively, q(ξ ) ∈ R m 1 +m 2 , h(ξ ) ∈ R r , and the (r , m)-matrix T (ξ ) are affine functions of ξ ∈ R d , and ρ is the probability density of a Borel probability measure P on R d . The primal and dual feasible right-hand side sets for the second-stage program are Clearly, (u, t) is finite for all (u, t) ∈ U ×T , it holds (0, 0) ∈ U ×T and (0, t) = 0 for any t ∈ T . While U is a convex polyhedral cone in R m 1 +m 2 , the structure of T is more complicated. The latter has the representation where K is the convex polyhedral cone Specific cases are (i) W 2 = 0 (pure continuous recourse) implying T = K and (ii) W 1 = 0 (pure integer recourse) leading to K = R r + . Next we introduce two assumptions: (A1) The matrices W 1 and W 2 have only rational elements. (A2) The cardinality of the set is finite, i.e., the number of integer decisions in (1) is finite. It is known that the set T is always connected (i.e., there exists a polygon connecting two arbitrary points of T ) and closed if (A1) is satisfied (see [4, Theorems 5.6.1 and 5.6.2]). The representation (3) implies that T can be decomposed into subsets of the form for each fixed t 0 ∈ T . Condition (A1) implies that the intersection in (5) may be replaced byt + K for somet ∈ T (see [4,Lemma 5.6.1]).
Hence, if (A1) is satisfied, there exist a finite subset N of N and elements t i ∈ T and z i j ∈ Z m 2 for i ∈ N and j belonging to a finite subset N i of N , such that T admits the representation The sets T (t i ), i ∈ N , are nonempty and connected (even star-shaped cf. [4, Theorem 5.6.3]), but nonconvex in general. If for some i ∈ N the set T (t i ) is nonconvex, it can be decomposed into a finite number of disjoint subsets whose closures are convex polyhedra with facets parallel to suitable facets of K. By renumbering all such subsets (for every i ∈ N ) one obtains a finite index set which is again denoted by N and subsets B i , i ∈ N , forming a partition of T . We will need the following result on optimal value functions of linear programs. For a given (r , m)-matrix W we consider the function from R m × R r to R. We define the primal and dual feasibility sets and recall some well-known properties of L (see [37,55]).

Lemma 1
The function L is finite and continuous on the convex polyhedral cone D × P in R m × R r and there exist (m, r )-matrices C j and convex polyhedral cones K j , j = 1, . . . , , such that The function L (u, ·) is convex on P for each u ∈ D, and L (·, t) is concave on D for each t ∈ P. Furthermore, the intersection K j ∩ K j , j = j , is either equal to {0} or contained in a (m + r − 1)-dimensional subspace of R m+r if the two cones are adjacent.
Now we are in the position to prove the following result on the representation and properties of the infimum function (see also [32, (2.10)] for the case of fixed u).
is continuous on U × B i for each i ∈ N and there exists a constant C > 0 such that holds for all pairs (u, t) ∈ U × T .
holds for every (u, t) ∈ U × B i and i ∈ N . Due to Lemma 1 there exist (r , m 1 ) matrices C j , j = 1, . . . , , such that inf The first infimum in (10) is lower bounded and, thus, attained. Hence, one obtains (u, t) = min for every pair (u, t) ∈ U × B i . For the remaining statements we refer to [43].
For more information on the continuity properties of on U × B i for any i ∈ N , we refer to [43]. Next we state our main representation result of the function .

Proposition 1 Assume (A1) and (A2). The function is finite and lower semicontin-
There exists a finite decomposition of U × T consisting of Borel sets U ν × B ν , ν ∈ N , such that their closures are convex polyhedral and is bilinear on each U ν × B ν . More precisely, there exist (r , m 1 ) matrices C ν and elements z ν ∈ Z m 2 such that is of the form for each (u, t) ∈ U ν × B ν . The function may have kinks or discontinuities at the boundaries of U ν × B ν , ν ∈ N .
Proof We start from the representation (8) of on U × B i for some i ∈ N and derive a further partition of U ×B i . To this end we consider the sets In addition, we consider the (r , m 1 ) matrices C j and the polyhedral cones K j , j = 1, . . . , , appearing in Lemma 2. More precisely, we need the projections pr 1 and pr 2 from R m 1 +r to R m 1 and R r , respectively, and the fact that pr 1 (K j ) and pr 2 (K j ) are also polyhedral cones for each j = 1, . . . , . For each i ∈ N we define the following subsets of U and of B i : starting from (8) in Lemma 2, using Lemma 1 and the definition of V il . Finally, we introduce a new index ν varying in a new (finite) index set N and a bijective mapping ν ↔ (i, j, l). By writing U ν instead of U i jl and B ν instead of B i jl we arrive at (11) by noting that C ν = C j and z ν = z l if ν ↔ (i, j, l). We also note that the sets U ν and the closures of B ν are convex polyhedral.
When defining the two-stage mixed-integer integrand f : (12) problem (1) may be rewritten as We introduce the additional assumption Condition (A3) refers to the standard requirements relatively complete recourse and dual feasibility (see [49,Section 2.1]). The structural result for in Proposition 1 leads to the following representation of the integrand f .

Proposition 2 Assume (A1)-(A3) and let x ∈ X . Then the integrand f is lower semicontinuous on X × R d and f (x, ·) is finite and linear-quadratic on the sets
for each ν ∈ N , where N , U ν and B ν are defined in Proposition 1.
on the sets ν (x), where the (r , m 1 ) matrix C ν and z ν ∈ Z m 2 are explained in Proposition 1. The functions f (x, ·) may have points of discontinuity or nondifferentiability at the boundaries of ν (x). The union of all ν (x) equals R d and their closures are convex polyhedral. Moreover, the estimate is valid for every pair (x, ξ) ∈ X × R d and some constantĈ > 0.
Proof The sets ν (x), ν ∈ N , form a partition of R d into Borel sets whose closures, denoted by cl ν (x), are of the form and, thus, are convex polyhedral, since h(·), T (·) and q(·) are affine functions, cl B ν , the closure of B ν , is convex polyhedral and U ν is convex polyhedral, too. The lower semicontinuity of f follows from Lemma 2. The representation (15) of f (x, ξ) for every pair (x, ξ) ∈ X × ν (x) and ν ∈ N follows immediately from (11). Since q 1 (·), q 2 (·), h(·) and T (·) are affine functions of ξ , the second summand of (15) is an affine function of ξ while the third represents a quadratic function. The final statement follows from (9) and the estimate after a few calculations for all pairs (x, ξ) ∈ X × R d and some constantĈ > 0.

ANOVA decomposition and effective dimension
The analysis of variance (ANOVA) decomposition of a function was first proposed as a tool in statistical analysis (see [18] and the survey [53]). Later it was often used for the analysis of quadrature methods mainly on [0, 1] d . Here, we make use of it on R d equipped with a probability density function ρ given in product form As in [15] we consider the weighted L p space over R d , i.e., L p,ρ (R d ), with the norm Let f ∈ L 1,ρ (R d ). The ANOVA projection P k , k ∈ D = {1, . . . , d}, is defined by Clearly, the function P k f is constant with respect to ξ k . For u ⊆ D we use |u| for its cardinality, −u for D\u and define the higher order ANOVA projection by where the product sign means composition. Due to Fubini's theorem the ordering within the product is not important and P u f is constant with respect to all ξ k , k ∈ u.
where each ANOVA term f u depends only on ξ u , i.e., on the variables ξ j with indices j ∈ u, and satisfies the property P j f u = 0 for all j ∈ u. It admits the recurrence relation It is known from [26] that the ANOVA terms are given explicitly in terms of the ANOVA projections by where P −u and P u−v mean integration with respect to ξ j , j ∈ D\u and j ∈ u\v, respectively. The first representation shows that lower order ANOVA terms f u with small |u| are given by higher order projections. The second representation reveals that the ANOVA term f u is essentially as smooth as the ANOVA Projection P −u ( f ) due to the Inheritance theorem [15,Theorem 2].
If f belongs to L 2,ρ (R d ), the projections P u f and the ANOVA terms f u also belong to Let the variance of f be given by To avoid trivial cases we assume σ ( f ) > 0 in the following. The normalized ratios where σ u ( f ) = f u 2,ρ , serve as indicators for the importance of the variable ξ u in f . They are used to define sensitivity indices of a set u ⊆ D for f in [52] and the dimension distribution of f in [31,41].
For small ε ∈ (0, 1) (ε = 0.01 is suggested in a number of papers), the effective We note that the effective superposition dimension d S (ε) is important for the error analysis of Quasi-Monte Carlo methods, but its computation is complicated. The effective truncation dimension is computationally accessible (see [52,56]). Note also that d S (ε) ≤ d T (ε) holds and the estimate is valid (see [13,56]). The estimate (25) means that the truncated ANOVA decom- represents an approximation of f . The importance of (25) is due to the fact that lower order ANOVA terms of f may have smoothness properties even if f is known to be nondifferentiable or discontinuous (see [14,15]). In that case (25) may be used in error estimates by exploiting the eventual smoothness of the lower order ANOVA terms.
To formulate smoothness conditions we follow [15] and use the notation where C ∞ 0 (R d ) denotes the space of infinitely differentiable functions with compact support in R d and D α v the classical derivative of v. We will use the same symbol for the weak derivative as for the classical one, i.e., we set D α f = g if (26) is satisfied, since classical derivatives are also weak derivatives. The latter holds because classical derivatives satisfy (26) which is just the multivariate integration by parts formula in the classical sense. We consider in the next sections the mixed Sobolev space of functions on R d having mixed weak first order derivatives that are quadratically integrable. In [54] such spaces are called Sobolev spaces with dominating mixed smoothness.

ANOVA decomposition of two-stage mixed-integer integrands
According to Proposition 2 two-stage mixed-integer integrands are discontinuous and piecewise linear-quadratic, hence, may be written in the form for all (x, ξ) ∈ X × ν (x) and some symmetric (d, d)-matrices A ν (·), d-dimensional vectors b ν (·) and real numbers c ν (·), which are all affine functions of x. The Borel sets ν (x), ν ∈ N are defined by (14) and have convex polyhedral closures. In addition to the conditions (A1)-(A3) we need to impose: (A4) The probability distribution P has finite fourth order absolute moments. Due to (16) the two-stage stochastic mixed-integer program (1) is already well defined if P has finite second order moments. However, the stronger condition (A4) together with the next one enable the use of the concepts from Sect. 3. (A5) P has a density ρ with respect to the Lebesgue measure on R d and ρ admits product form where the densities ρ i are positive and continuously differentiable, and ρ i and its derivative are bounded on R.
To apply the results in this section to general probability distributions P, one has to decompose the dependence structure of P. The latter is always possible using the multivariate distributional transform, which was first established in [44] in case that the conditional distribution functions of P are absolutely continuous. Later the distributional transform was extended to the general case (see [45]). (A6) For each face F of dimension greater than zero of the convex polyhedral sets cl ν (x), ν ∈ N , the affine hull aff(F) of F does not parallel any coordinate The face F is said to be defined by the inequality a, ξ ≤ b. Clearly, each face is itself a polyhedron. The dimension dim(P) of a polyhedron P is the dimension of its affine hull aff(P). A facet F of a polyhedron P with P = F is a face of dimension dim(P) − 1. Vertices of polyhedra are faces of dimension zero. For a short review of basic polyhedral theory we refer the reader to [20]. If F is any face of a polyhedron cl ν (x) for some ν ∈ N defined by the inequality g, ξ ≤ a for some g ∈ R d and a ∈ R, then (A6) means that all components of g do not vanish. Condition (A6) is stronger than the geometric condition imposed in [29] and important for deriving the results in this section. It will be illustrated in Example 1 and further discussed in Sect. 6. Since the polyhedra cl ν (x) are not explicitly given, condition (A6) has implicit character.
In the following we consider the ANOVA decomposition of f = f x (see (20)) for any fixed x ∈ X and show that lower order ANOVA terms of f are smoother than the function f itself. Since the ANOVA terms are given in terms of (ANOVA) projections P u (see (21)), we study first properties of projections. For If u = {k} for some k ∈ D we write ξ −k and ξ −k s with s ∈ R. First we derive bounds for P u f where f is given by (28).

Proposition 3 Let (A1)-(A5) be satisfied, x ∈ X be fixed, u ⊂ D and we consider an
integrand f = f x of the form (28). Then there exists a constantĈ > 0 such that Proof Using the representation u = {i 1 , . . . , i |u| ) and the definition (19) of P u f , we obtain from (16) for some positive constantĈ and all ξ −u ∈ u (R d ).
Next we study continuity and differentiability properties of projections and we start with first order projections P k f of f = f x for some k ∈ D. We know that holds for fixed ξ −k and every s ∈ R. According to (18) we have Due to (30) there exists a finite subsetN =N (ξ −k ) of N such that the onedimensional affine subspace {ξ −k s : s ∈ R} intersects the sets ν (x) for ν ∈N , where cl ν (x) and its adjacent sets have a common facet for every ν ∈N . Hence, there exists a partition of R into subintervals I ν = I ν (ξ −k ), ν ∈N , such that ξ −k s ∈ ν (x) for all s ∈ I ν and ν ∈N . We obtain the following representation of P k f by setting and using the identity ξ −k s = ξ −k 0 + s e k with e k denoting the element of R d having components If s ν is finite, the point ξ −k s ν belongs to the common facet of cl ν (x) and cl ν−1 (x). Let g ν = (g ν,1 , . . . , g ν,d ) ∈ R d and a ν ∈ R be selected such that the facet is defined by the inequality Hence, for finite s ν we obtain Since g ν,k = 0 due to condition (A6), we arrive at the representation Since the points s ν = s ν (ξ −k ) are affine functions of ξ −k and the integrands f ν (·, ξ −k ) are linear-quadratic in I ν (ξ −k ), classical results on integrals depending on parameters may be used to derive continuity and continuous differentiability of the projections P k f at anyξ −k ∈ k (R d ) as functions of ξ −k if the index setN (ξ k ) does not change in some neighborhood ofξ −k . In order to study the continuity of P k f also at points ξ −k where the index setsN (ξ −k ) do change in any neighborhood ofξ −k , we introduce some additional notation. Let B (ξ −k ) denote the open ball aroundξ −k with radius > 0 and let denote sets of convex polyhedra cl ν (x) that are met by the one-dimensional affine subspace {ξ −k s : s ∈ R}. Because any such subspace {ξ −k s : s ∈ R} for some ξ −k ∈ B (ξ −k ) is a parallel translation of {ξ −k s : s ∈ R}, 0 can be chosen small enough such that P(ξ −k ) ⊆ P(ξ −k ) holds for every ξ −k ∈ B 0 (ξ −k ). Therefore we have Since the polyhedra cl ν (x) are convex, the sets {ξ −k s ∈ cl ν (x) : s ∈ R} are convex, too, and, hence, represent either an interval or a single point if cl ν (x) belongs to P(ξ −k ). The latter is only possible if the one-dimensional affine space meets a vertex or an edge (i.e., faces of dimension zero or one) of cl ν (x). The subset of R d that contains all vertices and edges of all such polyhedra cl ν (x) has Lebesgue measure zero in belongs to the interior of cl ν (x). Otherwise, the interval I ν belongs to a facet of cl ν (x) which in turn is parallel to the canonical basis element e k contradicting the geometric condition (A6).

Proposition 4
Let (A1)-(A6) be satisfied, x ∈ X be fixed, k ∈ D and we consider an integrand f = f x of the form (28). Then its kth projection P k f is continuous on k (R d ) and first order continuously differentiable on k (R d )\M, where M is a closed set that is contained in the union of finitely many hyperplanes of dimension at most d − 2 and, thus, has Lebesgue measure zero in R d−1 . Moreover, the estimate holds for almost every ξ −k ∈ k (R d ), for all r ∈ D, r = k, and some constantĈ > 0. Proof First we prove continuity of P k f atξ −k and distinguish the following two cases: In case (i) we know that the function Due to Proposition 2 the estimate holds for all (s, ξ −k ) ∈ R × B 0 (ξ −k ) and some constant C > 0. The latter right-hand side represents an integrable majorant for f (ξ −k s ) and, hence, Lebesgue's dominated convergence theorem implies that P k f is continuous atξ −k .
If they disappear we set s ν (ξ −k ) = s ν+1 (ξ −k ) and include them formally into the representation of P k f (ξ −k ) which is of the form Letting ξ −k in (39) tend toξ −k and using the continuity of s ν (·) and of p j,ν (·; x) for ν belonging to the finite index setN (ξ −k ), a comparison with (38) proves the continuity of P k f atξ −k in case (ii), too. Finally, we return to case (i) and study differentiability properties of P k f at such points ξ −k ∈ k (R d ). From (32) we obtain for r ∈ D, r = k, that holds, where the corresponding term in (41) vanishes if s ν and s ν+1 , respectively, are not finite. Hence, P k f is first order continuously differentiable at points ξ −k satisfying (i) and, thus, on k (R d ) except at all boundary points of the polyhedra k ν (x), ν ∈ N . All boundaries are contained in a finite union of hyperplanes of dimension at most d − 2 which has Lebesgue measure 0 in R d−1 .
To prove the estimate (37) we fix n ∈N . To bound the first summand in (40) we note that ∂ ∂ξ r p j,ν (ξ −k ; x) is linear in ξ −k for j = 0 and constant for j = 1. The integral s j ρ k (s)ds is bounded by 1 for j = 0 and by a constant for j = 1. To bound the second summand in (40) and (41) we observe that the first factor p j,ν (ξ −k ; x) of each summand is bounded by a constant for j = 2, by a constant times ξ −k for j = 1, and by a constant times ξ −k 2 for j = 0. The second factor is bounded by a constant for j = 0, by a constant times ξ −k for j = 1, and by a constant times ξ −k 2 for j = 2. Furthermore, the coefficients of the polynoms p j,ν are affine functions of x, thus, can be bounded by a constant times max{1, x }. Altogether, both summands can be estimated from above by a constant times max{1, x } max{1, ξ −k 2 }, where the constant depends on ν and r . Finally, we note that ν and r vary in finite sets and arrive at the desired estimate (37).

Remark 1
When looking at the formula for the first order partial derivative of P k f in the proof of Proposition 4 given in (40), (41), it becomes evident that the first order differentiability result on k (R d )\M can be extended to twice partial differentiability if the conditions (A1)-(A6) are satisfied. Moreover, we state without recording the elementary proof and analogous arguments as in the last part of the preceding proof that the estimate holds for all ξ −k ∈ k (R d )\M, all q, r ∈ D\{k} and some constantC > 0.
The following example shows that the geometric condition (A6) is indispensable for Proposition 4 to hold true. Example 1 Let d = 2 and P denote a two-dimensional probability distribution with independent continuous marginal densities ρ k , k = 1, 2. We consider the two convex polyhedral cones (see the picture below)

and the infimal functions
which are piecewise constant lower semicontinuous functions. Both are simple (but typical) infimal value functions for pure integer optimization models.
Let the integrands f i be defined by where we let for simplicity x = 0. Then its kth first order ANOVA projections P k f i are where ξ −k ∈ R, k ∈ {1, 2}. We obtain for i = 1 otherwise, Hence, in general P 1 f 1 isn't continuous, but P 2 f 1 is continuous on R. The reason is that a facet of K 1 is parallel to the t 1 -axis. For i = 2 we have and, thus, P 1 f 2 and P 2 f 2 are continuous and piecewise continuously differentiable. For a discussion of the geometric condition (A6) we refer the reader to Sect. 6.
Using Proposition 4 we show now that second order projections P u f , of f with u D, |u| = 2, are even continuously differentiable on the entire space u R d .
For k, l ∈ D, k = l, we consider P k f and its projection P l P k f , i.e., the second order projection P u f of f with u = {k, l}. The function P k f is given on the space k R d which is subdivided into the sets k ( ν ), i.e., the kth projections of the original sets ν , ν ∈N , in R d . The closures k (cl ν ) of the sets k ( ν ) are convex polyhedral and have dimension d − 1 [3, Proposition 2.1]. We obtain where ξ −u = u ξ and ξ −u s = k ξ −l s , s ∈ R, and know that holds for each s ∈ R. Hence, similar as before Proposition 4 for ξ −k , for given ξ −u there exist a finite index set N 1 = N 1 (ξ −u ) and intervals I 1,ν with s 1,ν = inf I 1,ν and s 1,ν+1 = sup I 1,ν for ν ∈ N 1 such that where the first and the last interval are unbounded and the finite points s 1,ν belong to common facets G ν of two adjacent convex polyhedra of the form k (cl ν ). All such facets are kth projections of certain faces F ν of the polyhedra cl ν , i.e., k (F ν ) = G ν (see [20,Theorem 16] or [59, Lemma 7.10]). If the faces F ν are defined by the inequalities g 1,ν , ξ ≤ a 1,ν , the points s 1,ν may be represented in the form as in (33). Note that g 1,ν,l = 0 holds due to condition (A6).
To state our next result, we need the following notion. A real function g on R d is called locally Lipschitz continuous on lines if for each k ∈ D the function t → g(ξ −k t ) is Lipschitz continuous in t on compact subsets of R for almost every ξ −k ∈ k R d . (28). For any k, l ∈ D, k = l, u = {k, l}, the (ANOVA) projection P u f is continuously differentiable on u (R d ). In addition, the partial derivatives ∂ P u f ∂ξ r (ξ −u ) are locally Lipschitz continuous on lines and there exists C > 0 such that

Proposition 5 Let (A1)-(A6) be satisfied, x ∈ X be fixed and consider the integrand f = f x in
holds for every ξ −u ∈ u (R d ) and r ∈ −u.
Proof Let M be the closed set in Proposition 4. We consider k, l ∈ D with k = l and set u = {k, l}. From Proposition 4 we know that P k f is continuously differentiable at anyξ −k ∈ k (R d )\M. Hence, for any suchξ −k andξ −u = lξ −k we know that happens only at the finitely many points s = s 1,ν (ξ −u ) due to (A6) and the bound (37) is valid, we can use Lebesgue's theorem on dominated convergence. We conclude that P u f is continuously differentiable atξ −u and the identity holds for any r ∈ −u. To prove that P u f is continuously differentiable at anyξ −u ∈ u (R d ) we proceed as in the proof of Proposition 4 and consider the set P 1 (ξ −u ) of all convex polyhedra k ( ν ) such thatξ −u s ∈ k ( ν ) for some s ∈ R. The first case in the proof of Proposition 4 corresponds to the result (45). In the second case we know that for each > 0 there exists ξ −u ∈ B (ξ −u ) such that and the representation holds according to (43). We choose > 0 small enough such that the relation . If they disappear we set s 1,ν (ξ −u ) = s 1,ν+1 (ξ −u ) and include them formally into the representation of P u f (ξ −u ) which is of the form In a small ball around ξ −u this representation doesn't change. Hence, P u f is differentiable at ξ u and we obtain for any r ∈ −u where the summands in (48) cancel successively, and the first and the last term in (48) vanish by definition. Letting ξ −u converge toξ −u in the right-hand side of (49) proves the continuous differentiability of P u f atξ u , where the partial derivative with respect to ξ r is given by (45).
To show that the partial derivatives ∂ P u f ∂ξ r are locally Lipschitz continuous on lines, we consider first the partial derivative of the first order projection ∂ P k f ∂ξ r (ξ −k ) given by the Eqs. (40) and (41). Let p ∈ −u. We fix all components of ξ −k except the pth component ξ p . The representation (40) and (41) of ∂ P k f ∂ξ r (ξ −k ) is valid for all ξ p ∈ R except at finitely many pointsξ p,ν , ν = 1, . . . , N p = N p (ξ −{k, p} ). We assume that the points are ordered with respect to the natural order and observe that in each of the open intervals I p,0 = (−∞,ξ p,1 ), I p,ν = (ξ p,ν ,ξ p,ν+1 ) and I p,N p = (ξ p,N p , +∞) the partial derivative ∂ P k f ∂ξ r (ξ −k ) is equal to a sum of products of functions that are locally Lipschitz continuous with respect to ξ p . Hence, ∂ P k f ∂ξ r (ξ −k ) is Lipschitz continuous on each bounded part of I p,0 and I p,N p , and on each interval I p,ν , ν = 1, . . . , N p − 1, respectively. Now, let I B denote a bounded interval and let ξ p ,ξ p ∈ I B , ξ p <ξ p . We choose ε > 0 and κ ≤ μ such thatξ p,κ−1 < ξ p + ε <ξ p,κ − ε <ξ p,μ + ε < ξ p − ε <ξ p ≤ξ p,μ+1 and denote by ξ −u ε andξ −u −ε the elements in R d−2 in which the pth components are ξ p + ε andξ p − ε, respectively, and all other components be fixed. Similarly, we introduce the notations ξ −u s,±ε andξ −u s,ν,±ε . Then we obtain Using the local Lipschitz continuity property of ∂ P k f ∂ξ r on the intervals I p,ν with (maximal) Lipschitz modulus L > 0, we may continue the estimate Next we let ε tend to zero and make use of the continuity of ∂ P u f ∂ξ r . This leads to and, hence, to the desired Lipschitz continuity property on lines.
For r ∈ −u and ξ −u ∈ u (R d ) we conclude finally from (49) and (37) that and, thus, (44) holds for some sufficiently large constant C > 0.
The following is our main result in this section. It states that the first and second order ANOVA terms of f have mixed weak first order partial derivatives.
Theorem 1 Let (A1)-(A6) be satisfied, x ∈ X be fixed and we consider an integrand f = f x of the form (28). Then all first and second order ANOVA terms f u , 0 = |u| ≤ 2, u ⊆ D, are first order continuously differentiable and have second order mixed weak first order derivatives that belong to L 2,ρ (R d ). Hence, they belong to the mixed Sobolev space W (1,...,1)

2,ρ,mix (R d ).
Proof Due to Proposition 5 all second order projections P u f of f with |u| = 2 are continuously differentiable and their partial derivatives are locally Lipschitz continuous on lines on u (R d ). These properties carry over to higher order projections P v f with 2 < |v| < d. While the continuous differentiability follows from the dominated convergence theorem using the bound (44), the local Lipschitz continuity on lines of the partial derivatives is a consequence of Proposition 5 and of the following estimate for subsets u, v of D with u ⊂ v and 2 = |u| < |v|: According to (21) the ANOVA terms of f are given by for all nonempty subsets u of D. Hence, all ANOVA terms f u of f for |u| < d − 1 are continuously differentiable on u (R d ). The non-vanishing first order partial derivatives of the first and second order ANOVA terms are of the form for any l, k ∈ D. Hence, the functions D l f {l,k} and D k f {l,k} are locally Lipschitz continuous with respect to each of the two variables ξ l and ξ r independently when the other variable is fixed almost everywhere. Hence, D l f {l,k} and D k f {l,k} are partially differentiable with respect to ξ k and ξ l , respectively, in the sense of Sobolev (see, for example, [10, Section 4.2.3]). Furthermore, the mixed weak first derivatives coincide with the mixed first classical derivatives at some point if the latter exist at this point. We know from Remark 1 that second order classical mixed first derivatives of P k f and, thus, of all projections P v f with |v| ≤ d − 1 exist almost everywhere due to the dominated convergence theorem. Hence, the classical mixed first derivatives ∂ξ k ∂ξ l exist almost everywhere and coincide there with the mixed weak first derivatives. The bound (42) then implies that the estimate is valid for almost every pair (ξ l , ξ k ) ∈ −{l,k} R d , any l, k ∈ D, any x ∈ X and some constant C > 0. We conclude that D lk f {l,k} belongs to L 2,ρ (R d ) for all l, k ∈ D due to (A4).
Remark 2 Let f (k) denote the kth order ANOVA approximation of the two-stage mixed-integer integrand f (see (28)) for some 1 ≤ k < d. Theorem 1 furnishes conditions implying that f (2) has all mixed weak first order partial derivatives and our next Remark discusses its extension to f (k) for k > 2. According to (20) and to the orthogonality of the ANOVA terms f u in L 2,ρ one has If the effective superposition dimension d S (ε) of f (see (23)) is at most k, the mean square error of the integrands f and f (k) satisfies due to (25). For a discussion of techniques for determining and reducing the effective superposition dimension in case of (log)normal probability distributions we refer to [31,41,[56][57][58].

Remark 3
If in addition to (A1)-(A6) the densities ρ i , i ∈ D, have also continuous derivatives of order k ≥ 1 and all derivatives are bounded on R, the result in Remark 1 extends to mixed first derivatives of order k + 1 for P j f on j (R d )\M, j ∈ D. With the techniques used in Proposition 5 this allows to prove that P u f has second order mixed first derivatives if |u| = 3 and, more general, that P u f has kth order mixed first derivatives which are locally Lipschitz continuous on lines if |u| = k + 1. The corresponding bounds for the mixed derivatives can be proved, too. Finally, Theorem 1 extends to the existence of kth order mixed weak first derivatives for P u f if |u| = k, k ≤ d 2 . The representation (21) of ANOVA terms then implies that f u with |u| = k belongs to W (1, ...,1) 2,ρ,mix (R d ) for 1 ≤ k ≤ d 2 − 1. An important consequence is that Theorem 2 in Sect. 5 remains valid for two-stage mixed-integer integrands with effective superposition dimension 2 ≥ d S (ε) = k ≤ d 2 − 1 for some ε > 0 by arguing with kth order ANOVA approximations.

Remark 4
For the special case of linear two-stage integrands f it is shown in [29] that P k f is continuously differentiable on R d and has mixed weak first derivatives of order 2. Under the assumptions imposed in [29] we obtain in this case from Remark 3 that the projection P u f with |u| = k has mixed weak first derivatives of order k + 1 for 1 ≤ k ≤ d 2 and the ANOVA term f u with |u| = k belongs to W (1,...,1)

Error analysis for randomly shifted lattice rules
In this section we provide an error analysis for randomly shifted lattice rules. Convergence results for this method are known for integrands from weighted tensor product Sobolev spaces on [0, 1] d (see [8,23]). Since typical integrands in stochastic programming are defined on R d , we introduce first appropriate Sobolev spaces. Following [26,36] we start with the weighted Sobolev spaces W 1 2,γ i ,ρ i ,ψ i (R) of functions h ∈ L 2,ρ i (R) that are absolutely continuous with derivatives h ∈ L 2,ψ i (R) and positive continuous weight functions ψ i , i ∈ D = {1, . . . , d}. They are endowed with the weighted inner product where for each i ∈ D the weight γ i is positive. We know that for any The latter condition implies that the weighted Sobolev space is complete [22] and, thus, a Hilbert space. Furthermore, it is known that there exists a reproducing kernel, i.e., a function and φ i is the distribution function of the density ρ i (see [36,Lemma 1]). This means that for all x ∈ R and h ∈ W 1 2,γ i ,ρ i ,ψ i (R). For more information on reproducing kernel Hilbert spaces we refer to the seminal paper [2] and to the monograph [6]. It is known from [2] that the weighted tensor product Sobolev space is also a kernel reproducing Hilbert space with the reproducing kernel The inner product of F d is given by where the integrands I u,ρ ( f )(ξ u ) and the weights γ u are defined by In the QMC literature, this is called the unanchored setting with product weights. In order to apply QMC methods to the computation of integrals The reproducing kernel and inner product of G d are The choice of the weight functions ψ i depends on the marginal densities ρ i , i ∈ D. We refer to [25,36] for a discussion of this aspect and for a list of marginal densities and the corresponding weight functions. Now, we consider randomly shifted lattice rules for numerical integration in G d (see [23,50]). Let Z n = {z ∈ N : 1 ≤ z ≤ n, gcd (z, n) = 1} denote the set of natural numbers between 1 and n that are relatively prime to n. Given a generating vector g ∈ Z d n and a random shift vector which is uniformly distributed in [0, 1] d , the shifted lattice rule points are t j = { jg n + }, j = 1, . . . , n, where the braces indicate taking componentwise the fractional part. The corresponding randomized QMC method on G d is of the form and its shift-averaged worst-case error can be computed using the reproducing kernel. Let ϕ(n) denote the cardinality of Z n , thus, ϕ(n) = n if n is prime, and ξ j = −1 (t j ) for j = 1, . . . , n. Then we obtain from [36,Theorem 8] that a generating vector g ∈ Z d n can be constructed by a component-by-component algorithm such that for each δ ∈ (0, 1 2 ] there exists C(δ) > 0 (not depending on d) with if the following condition on the weights is satisfied and f belongs to F d . To state our next result we denote by v(P) the infimal value of (1) and by v(Q n,d ) the infimum if the integral in (1) is replaced by the randomly shifted lattice rule (51) with sample size n.
Theorem 2 Let (A1)-(A6) be satisfied, the densities ρ i , i ∈ D, be k ≥ 2 times differentiable and all derivatives be bounded on R and X be compact. Assume that all integrands f = f x , x ∈ X , of the form (28) have effective superposition dimension d S (ε) = k ≤ d 2 − 1 for some ε > 0 and that the kth order ANOVA approximation f (k) of f (see (50)) belongs to F d . Furthermore, we assume that Q n,d is a randomly shifted lattice rule (51) satisfying (52). For each δ ∈ (0, 1 2 where the sequence (a n ) converges to zero and allows the estimate with σ ( f ) denoting the standard deviation of f (22).
The result means that the sequence of random infima v(Q n,d ) converges in quadratic mean to the true infimum with the optimal convergence rate O(ϕ(n) −1+δ ) at least until the error becomes very small. Theorem 2 can be proved by following the lines of the proof of [30,Theorem 3] for k = 2 with obvious modifications by using [27,Proposition 4] and [42,Theorem 5]. We note that the differentiability properties of the kth order ANOVA approximation f (k) of f discussed in Remark 3 motivate the imposed condition for f (k) .

Generic smoothness in the normal case
Let ξ be a d-dimensional normal random vector with mean μ and nonsingular covariance matrix . Then there exists an orthogonal matrix Q such that Q Q is a diagonal matrix. Then the d-dimensional random vector η given by the transformation is normal with zero mean and diagonal covariance matrix, i.e., η has independent components. For fixed x ∈ X , let ν (x), ν ∈ N , denote the decomposition (14) of R d into Borel sets whose closures are convex polyhedral. The transformed function f (x, η) = f (x, Qη + μ) is linear-quadratic on the sets Q ν (x) − Q μ, ν ∈ N , whose closures are again convex polyhedral. The intersections of two adjacent convex polyhedral sets cl ν (x) are facets, which are contained in Since the number of facets of each polyhedral set cl ν (x) is finite, there are finitely many equations that describe all (d − 1)-dimensional affine subspaces each containing at least one facet of some polyhedron cl ν (x). A d − k dimensional face of a given polyhedral set cl ν (x) is then a subset of an affine subspace described by a system of k linear independent equations (intersection of k hyperplanes) Under the linear transformation (56), the corresponding face of the transformed polyhedron cl (Q ν (x) − Q μ) is a subset of an affine space described by the system In order to make sure that the face of the transformed polyhedron does not parallel any coordinate axis, it is sufficient to show that the system V k Qη = b must be solvable for each subset of k variables η i 1 , . . . , η i k in terms of the remaining components of η. The latter condition is satisfied if each square submatrix of the (k, d)-matrix A = V k Q is nonsingular or, equivalently, each minor of order r for 1 ≤ r ≤ k of the matrix A is nonzero. Now, let 1 ≤ r ≤ k and A r be any (r , r )-submatrix of A. Then A r is given as product of r rows of the matrix V k = (v il ) and r columns of the matrix Q = (q l j ), i.e., According to the Cauchy-Binet formula the minor |A r | = det(A r ) is of the form In particular, the minor |A r | = det(A r ) can be interpreted as a multivariate polynomial function where the variables are the entries of the r columns of Q, and the coefficients are given in terms of the entries of the r selected rows of V k . Hence, zeros of the multivariate polynomial correspond to an orthogonal matrix Q for which condition (A6) after the transformation is violated. Next we argue that the multivariate polynomial |A r | is non-constant. Assuming the contrary means that all r -minors that can be obtained from the selected r rows of the matrix V k must be zero. This implies that those r rows are not linearly independent which contradicts the construction of V k . We also note that for any d − k dimensional face with 1 ≤ k < d which defines a system V k ξ = b, a multivariate polynomial |A r | as considered above cannot contain all entries of a column of Q in its variables. It follows that the equations on the entries of Q defining the matrix Q as orthogonal cannot imply that |A r | is a constant polynomial (as it is for the polynomial |Q| over O(d, R)). For the following part we refer to [7] for an introductory presentation of the Haar measure on topological groups. It is known that O(d, R) is a compact topological group and a smooth manifold of dimension where the first term on the right-hand side corresponds to the number of elements of a matrix Q ∈ O(d, R) and the second term is the number of equations Q i , Q j = δ i, j , i, j ∈ D, i ≤ j, describing the orthonormality of the columns of Q. One important fact of O(d, R) is that this group has two connected components, one for matrices having determinant equal to 1 including the identity matrix, and the other one for matrices having determinant equal to −1. The set of real orthogonal matrices having determinant equal to 1 build a subgroup, called the special orthogonal group, and is denoted by SO(d, R).  ∈ SO(d, R), then Q transforms a polyhedron such that a face of the transformed polyhedron parallels a coordinate axis if and only if Q + parallels the same axis. Therefore, the set of orthogonal matrices transforming a polyhedron such that a resulting face parallels some axis can be described as a set S Q + ⊂ SO(d, R) having this property, or a set S Q − ⊂ (O(d, R)\SO(d, R)) having this property, where S Q − can be described as S Q − = I − S Q + (that is, every matrix in S Q − is given as a matrix in S Q + multiplied by I − ). By the invariance property of the Haar measure λ over O(d, R), we have that λ( The restriction of the Haar measure over O(d, R) to its subgroup SO(d, R) coincides with the Haar measure on SO(d, R) up to a normalization constant. Considering now especially S Q + , our aim is to show that the zero-set of the multivariate polynomial |A r | is a set of Haar measure zero over the group SO(d, R). The special orthogonal group SO(d, R) allows a parametrization via hyperspherical coordinates. We follow the presentation in [11,Chapter 1] and obtain that each Q ∈ SO(d, R) allows a representation in the form where the orthogonal matrices T i j (ϕ i j ) define a rotation in the coordinate plane where c d denotes some normalizing constant. By applying this parametrization to the multivariate polynomial |A r |, one obtains a non-constant analytic function. We recall that the zero-set of a non-constant multivariate analytic function has Lebesgue measure zero [35]. Therefore the restriction of the zero-set of the parametrized multivariate polynomial |A r | to the parametrization domain box of the special orthogonal group has zero Lebesgue measure. Hence, the zero-set of the multivariate polynomial |A r | has measure zero with respect to the Haar measure over SO(d, R). By taking finite unions of the corresponding sets of zero Haar measure over SO(d, R) with respect to all r -minors of A and all transformed polyhedra cl (Q ν (x) − Q μ) having a face parallel to some coordinate axis, the set of the corresponding special orthogonal matrices has Haar measure zero. Since the latter set transformed by I − also has zero Haar measure, we arrive at the following statement as a consequence of Theorem 1.
Theorem 3 Let (A1)-(A5) be satisfied, x ∈ X be fixed, f = f (x, ·) be given by (28) and ξ be normally distributed with nonsingular covariance matrix . After the orthogonal transformation (56) of ξ the second order ANOVA approximation f (2) of f (see Remark 2) belongs to W (1,. According to Remarks 3 and 4 Theorem 3 remains valid for kth order ANOVA approximations (50) of the integrand f for k ≤ d 2 − 1 in the two-stage mixed-integer case and for k ≤ d 2 in the two-stage linear case.

Discussion of numerical experience and conclusions
In our numerical experiments reported in the companion paper [30] we compare two randomized QMC methods, namely, randomly shifted lattice rules [38,50] and randomly scrambled Sobol' point sets (based on [19,51] and random linear scrambling [33]) with Monte Carlo methods [34] by applying them to a two-stage stochastic mixed-integer electricity portfolio optimization model. Its aim consists in minimizing the expected costs over a time horizon with T time intervals in the presence of stochastic load and prices. The latter are modelled as multivariate ARMA(1,1) process. The resulting multivariate probability distribution is normal with covariance matrix of dimension d = 2T which is factorized in the form = B B . Two types of factorizations B are used, namely, (i) standard Cholesky (CH) and (ii) principal component analysis (PCA). Under PCA we obtained in our test runs with T = 100 that the effective truncation dimension d T (ε) is equal to 2 for ε = 0.01 and the two-stage mixed-integer integrand f . We also observed that under PCA the first variable accumulates more than 90% of the total variance σ 2 ( f ). This means d S (0.01) = 2 and indeed PCA is an excellent dimension reduction technique in this case. The average of the estimated rates of convergence O(n −α ) for the root mean square error of the optimal values under PCA in our computational tests were approximately α = 0.91 for randomly shifted lattice rules, and α = 1.05 for the randomly scrambled Sobol' points. This is clearly superior to the MC convergence rate α = 0.5. The same test runs were performed by using CH instead of PCA for factorizing . The average of the estimated rates of convergence were O(n −0.5 ) for all three methods under CH although the error constants of the randomized QMC methods seemed to be smaller. An explanation for the worse rates is that the approximate smoothing effect due to the eventual smoothness of lower order ANOVA terms does not occur since the effective truncation dimension under CH always remained d T (0.01) = 200.
Compared to our earlier work in [29] we showed for linear two-stage integrands f in the present paper that even ANOVA terms f u of order 2 ≤ |u| < d 2 have mixed weak first partial derivatives (Remark 4) and that this property extends to two-stage mixed-integer integrands for 2 ≤ |u| ≤ d 2 − 1 (Remark 3). However, several questions still remain open. For example, a sufficient condition is desirable implying that lower order ANOVA terms belong to the tensor product Sobolev space F d (see Theorem 2 in Sect. 5). Furthermore, a discussion of the geometric condition (A6) in Sect. 6 beyond the case of normal probability distributions is important.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.