Covering of high-dimensional cubes and quantization

As the main problem, we consider covering of a $d$-dimensional cube by $n$ balls with reasonably large $d$ (10 or more) and reasonably small $n$, like $n=100$ or $n=1000$. We do not require the full coverage but only 90\% or 95\% coverage. We establish that efficient covering schemes have several important properties which are not seen in small dimensions and in asymptotical considerations, for very large $n$. One of these properties can be termed `do not try to cover the vertices' as the vertices of the cube and their close neighbourhoods are very hard to cover and for large $d$ there are far too many of them. We clearly demonstrate that, contrary to a common belief, placing balls at points which form a low-discrepancy sequence in the cube, makes for a very inefficient covering scheme. For a family of random coverings, we are able to provide very accurate approximations to the coverage probability. We then extend our results to the problems of coverage of a cube by smaller cubes and quantization, the latter being also referred to as facility location. Along with theoretical considerations and derivation of approximations, we discuss results of a large-scale numerical investigation.


Introduction
In this paper, we develop and study efficient schemes for covering and quantization in high-dimensional cubes. In particular, we will demonstrate that the proposed schemes are much superior to the so-called 'low-discrepancy sequences'. The paper starts with introducing the main notation, then we formulate the main problem of covering a d-dimensional cube by n Euclidean balls. This is followed by a discussion on the main principles we have adopted for construction of our algorithms. Then we briefly formulate problems of covering a cube by smaller cubes (which are balls in the L ∞ -norm) and the problem of quantization. Both problems have many similarities with the main problem of covering a cube by n balls. At the end of this section, we describe the structure of the remaining sections of the paper and summarize our main findings.

Main notation
-R d : d-dimensional space; -· and · ∞ : Euclidean and L ∞ -norms in R d ; -B d (Z, r) = {Y ∈ R d : Y − Z ≤ r}: d-dimensional ball of radius r centered at Z ∈ R d ; -B d (r) = B d (0, r) = {Y ∈ R d : Y ≤ r}; -S d (Z, r) = {Y ∈ R d : Y − Z = r}: d-dimensional sphere of radius r centered at Z ∈ R d ; -C d (Z, δ) = {Y ∈ R d : Y − Z ∞ ≤ δ}: d-dimensional cube of side length 2δ centered at Z (it is also the d-dimensional ball in the L ∞ -norm with radius δ and center Z);

Main problem of interest
The main problem discussed in the paper is the following problem of covering a cube by n balls. Let C d = [−1, 1] d be a d-dimensional cube, Z 1 , . . . , Z n be some points in R d and B d (Z j , r) be the corresponding balls of radius r centered at Z j (j = 1, . . . , n). The dimension d, the number of balls n and their radius r could be arbitrary. We are interested in the problem of choosing the locations of the centers of the balls Z 1 , . . . , Z n so that the union of the balls ∪ j B d (Z j , r) covers the largest possible proportion of the cube C d . That is, we are interested in choosing a scheme (a collection of points) Z n = {Z 1 , . . . , Z n } so that is as large as possible (given n, r and the freedom we are able to use in choosing Z 1 , . . . , Z n ). Here and C d (Z n , r) is the proportion of the cube C d covered by the balls B d (Z j , r) (j = 1, . . . , n). For a scheme Z n , its covering radius is defined by CR(Z n ) = max X∈C d min Zj ∈Zn X − Z j . In computer experiments, covering radius is called minimax-distance criterion, see [5] and [13]; in the theory of lowdiscrepancy sequences, covering radius is called dispersion, see [8,Ch. 6]. The problem of optimal covering of a cube by n balls has very high importance for the theory of global optimization and many branches of numerical mathematics. In particular, the celebrated results of A.G.Sukharev imply that an n-point design Z n with smallest CR provides the following: (a) min-max n-point global optimization method in the set of all adaptive n-point optimization strategies, see [14, Ch.4,Th.2.1], and (b) the n-point min-max optimal quadrature, see [14, Ch.3,Th.1.1]. In both cases, the class of (objective) functions is the class of Liptshitz functions with known Liptshitz constant. If d is not small (say, d > 5) then computation of the covering radius CR(Z n ) for any non-trivial design Z n is a very difficult computational problem. This explains why the problem of construction of optimal n-point designs with smallest covering radius is notoriously difficult, see for example recent surveys [16,17]. If r =CR(Z n ), then C d (Z n , r) defined in (1) is equal to 1, and the whole cube C d gets covered by the balls. However, we are only interested in reaching the values like 0.9, when a large part of the ball is covered. There are two main reasons why we are not interested in reaching the value C d (Z n , r) = 1: (a) practical impossibility of making a numerical checking of the full coverage, if d is large enough, and (b) our approximations lose accuracy when C d (Z n , r) closely approaches 1.
If, for a given γ ∈ [0, 1), we have C d (Z n , r) ≥ 1 − γ, then the corresponding coverage of C d will be called (1 − γ)-coverage; the corresponding value of r can be called (1 − γ)-covering radius. If γ = 0 then the (1−γ)-coverage becomes the full coverage and 1-covering radius of Z n becomes C d (Z n , r). Of course, for any Z n = {Z 1 , . . . , Z n } we can reach C d (Z n , r) = 1 by means of increasing r. Likewise, for any given r we can reach V d (Z n , r) = 1 by sending n → ∞. However, we are not interested in very large values of n and try to get the coverage of the most part of the cube C d with the radius r as small as possible. We will keep in mind the following typical values of d and n: d = 10, 20, 50; n = 64, 128, 512, 1024. Correspondingly, we will illustrate our results in such scenarios.

Two contradictory criteria and a compromise
In choosing Z n = {Z 1 , . . . , Z n }, the following two main criteria must be followed: (i) the volumes of intersections of the cube C d and each individual ball B d (Z j , r) are not very small; These two criteria do not agree with each other. Indeed, as shown in Section 2, see formulas (12)- (15), the volume of intersection of the ball B d (Z, r) and the cube C d is approximately inversely proportional to Z and hence criterion (i) favours Z j with small norms. However, if at least some of the points Z j get close to 0, then the distance between these points gets small and, in view of the formulas of Section 6.7, the volumes of intersections B d (Z j , r) ∩ B d (Z i , r) get large. This yields that the above two criteria require a compromise in the rule of choosing Z n = {Z 1 , . . . , Z n } as the points Z j should not be too far from 0 but at the same time, not too close. In particular, and this is clearly demonstrated in many examples throughout the paper, the so-called 'uniformly distributed sequences of points' in C d , including 'low-discrepancy sequences' in C d , provide poor covering schemes. This is in a sharp contrast with the asymptotic case n → ∞ (and hence r → 0), when one of the recommendations, see [2, p.84], is to choose Z j 's from a uniformly distributed sequence of points from a set which is slightly larger than C d ; this is to facilitate covering of the boundary of C d , as it is much easier to cover the interior of the cube C d than its boundary. In our considerations, n is not very large and hence the radius of balls r cannot be small. One of our recommendations for choosing Z n = {Z 1 , . . . , Z n } is to choose Z j 's at random in a cube C d (δ) = [−δ, δ] d (with 0 < δ < 1) with components distributed according to a suitable Beta-distribution. The optimal value of δ is always smaller than 1 and depends on d and n. If d is small or n is astronomically large, then the optimal value of δ could be close to 1 but in most interesting instances this value is significantly smaller than 1. This implies that the choice δ = 1 (for example, if Z j 's form a uniformly distributed sequence of points in the whole cube C d ) often leads to very poor covering schemes, especially when the dimension d is large (see Tables 1-3 in discussed in Section 3). More generally, we show that for construction of efficient designs Z n = {Z 1 , . . . , Z n }, either deterministic or randomized, we have to restrict the norms of the design points Z j . We will call this principle 'δ-effect'.

Covering a cube by smaller cubes and quantization
In Section 4 we consider the problem of (1 − γ)-coverage of the cube C d = [−1, 1] d by smaller cubes (which are L ∞ -balls). The problem of 1-covering of cube by cubes has attracted a reasonable attention in mathematical literature, see e.g. [6,3]. The problem of cube (1 − γ)-covering by cubes happened to be simpler than the main problem of (1 − γ)-coverage of a cube by Euclidean balls and we have managed to derive closed-form expressions for (a) the volume of intersection of two cubes, and (b) (1 − γ) coverage, the probability of covering a random point in C d by n cubes C d (Z i , r) for a wide choice of randomized schemes of choosing designs Z n = {Z 1 , . . . , Z n }. The results of Section 4 show that the δ-effect holds for the problem of coverage of the cube by smaller cubes in the same degree as for the main problem of Section 3 of covering with balls. Section 5 is devoted to the following problem of quantization also known as the problem of facility location. Let X = (x 1 , . . . , x d ) be uniform on C d = [−1, 1] d and Z n = {Z 1 , . . . , Z n } be an n-point design. The mean square quantization error is Q(Z n ) = E X min i=1,...,n X − Z i 2 . In the case where Z 1 , . . . , Z n are i.i.d. uniform on C d (δ), we will derive a simple approximation for the expected value of Q(Z n ) and clearly demonstrate the δ-effect. Moreover, we will notice a strong similarity between efficient quantization designs and efficient designs constructed in Section 3.

Structure of the paper and main results
In Section 2 we derive accurate approximations for the volume of intersection of an arbitrary d-dimensional cube with an arbitrary d-dimensional ball. These formulas will be heavily used in Section 3, which is the main section of the paper dealing with the problem of (1 − γ)-coverage of a cube by n balls. In Section 4 we extend some considerations of Section 3 to the problem of (1 − γ)-coverage of the cube C d by smaller cubes. In Section 5 we argue that there is a strong similarity between efficient quantization designs and efficient designs of Section 3. In Appendix A, Section 6, we briefly mention several facts, used in the main part of the paper, related to high-dimensional cubes and balls. In Appendix B, Section 7, we prove two simple but very important lemmas about distribution and moments of certain random variables.
Our main contributions in this paper are: an accurate approximation (19) for the volume of intersection of an arbitrary d-dimensional cube with an arbitrary d-dimensional ball; an accurate approximation (27) for the expected volume of intersection of the cube C d with n balls with uniform random centers Z j ∈ C d (δ); closed-form expression of Section 4.2 for the expected volume of intersection the cube C d with n cubes with uniform random centers Z j ∈ C d (δ); construction of efficient schemes of quantization and (1 − γ)-coverage of the cube C d by n balls; large-scale numerical study.
We are preparing an accompanying paper [9] in which we will further explore the topics of Sections 3-5 and also consider the problems of quantization and (1 − γ)-coverage in the whole space R d and the problem of (1 − γ)-coverage of simplices.
Our aim is to approximate C d,Z,r for arbitrary d, Z and r. We will derive a CLT-based normal approximation in Section 2.3 and then, using an asymptotic expansion in the CLT for non-identically distributed r.v., we will improve this normal approximation in Section 2.4. In Section 6.8 we consider a more direct approach for approximating C d,Z,r based on the use of characteristic functions and the fact that C d,Z,r is a c.d.f. of U − Z , where U = (u 1 , . . . , u d ) is random vector with uniform distribution on C d . From this, C d,Z,r can be expressed through the convolution of one-dimensional c.d.f's. Using this approach we can evaluate the quantity C d,Z,r with high accuracy but the calculations are rather time-consuming. Moreover, entirely new computations have to be made for different Z and, therefore, we much prefer the approximation of Section 2.4. Note that in the special case Z = 0, several approximations for the quantity C d,0,r have been derived in [15] but their methods cannot be generalized to arbitrary Z. Note also that symmetry considerations imply the following relation between C d,0,r and C d,

A generalization of the quantity (3)
In the next sections, we will need another quantity which slightly generalizes (3). Assume that we have Then the following change of the coordinates and the radius gives 2.3 Normal approximation for the quantity (3) Let U = (u 1 , . . . , u d ) be a random vector with uniform distribution on C d so that u 1 , . . . , u d are i.i.d.r.v. uniformly distributed on [−1, 1]. Then for given Z = (z 1 , . . . , z d ) ∈ R d and any r > 0, That is, C d,Z,r , as a function of r, is the c.d.f. of the r.v. U − Z . Let u have a uniform distribution on [−1, 1] and |z| ≤ 1. In view of Lemma 1 of Section 7, the density of the r.v. and where µ (3) z is the third central moment: µ with expressions (9) for Eη z , var(η z ) and µ z not changing. Consider the r.v.
From (9), its mean is Using independence of u 1 , . . . , u d , we also obtain from (9): If d is large enough then the conditions of the CLT for U −Z 2 are approximately met and the distribution of U − Z 2 is approximately normal with mean µ d,Z and variance σ 2 d,Z . That is, we can approximate C d,Z,r by where Φ(·) is the c.d.f. of the standard normal distribution: The approximation (15) has acceptable accuracy if C d,Z,r is not very small; for example, it falls inside a 2σ-confidence interval generated by the standard normal distribution; see Figures 1-2 as examples. Let p β be the quantile of the standard normal distribution defined by Φ(β) = 1 − p β ; for example, p β 0.05 for β = 2. As follows from (12), (13) and the approximation (15), we expect the approximate inequality In many cases discussed in Section 3, the radius r does not satisfy the inequality (16) with β = 2 and even β = 3 and hence the normal approximation (15) is not satisfactorily accurate; this can be evidenced from Figures 1 -16 below.
In the next section, we improve the approximation (15) by using an Edgeworth-type expansion in the CLT for sums of independent non-identically distributed r.v.

Improved normal approximation
General expansion in the central limit theorem for sums of independent non-identical r.v. has been derived by V.Petrov, see Theorem 7 in Chapter 6 in [10], see also Proposition 1.5.7 in [12]. The first three terms of this expansion have been specialized by V.Petrov in Section 5.6 in [11]. By using only the first term in this expansion, we obtain the following approximation for the distribution function of U − Z 2 : leading to the following improved form of (15): where From the viewpoint of Section 3, the range of most important values of t from (18) (17) brings the normal approximation down and makes it much more accurate. The other terms in Petrov's expansion of [10] and [11] continue to bring the approximation down (in a much slower fashion) so that the approximation (17) still slightly overestimates the true value of C d,Z,r (at least, in the range of interesting values of t from (18)). However, if d is large enough (say, d ≥ 20) then the approximation (17) is very accurate and no further correction is needed. A very attractive feature of the approximations (15) and (18) is their dependence on Z through Z only. We could have specialized for our case the next terms in Petrov's approximation but these terms no longer depend on Z only (this fact can be verified from the formula (54) for the fourth moment of the r.v. ν z = (z − u) 2 ) and hence the next terms are much more complicated. Moreover, adding one or two extra terms from Petrov's expansion to the approximation (17) does not fix the problem entirely for all Z and r. Instead, we propose a slight adjustment to the r.h.s of (17) to improve this approximation, especially for small dimensions. Specifically, we suggest the approximation where c d = 1 + 3/d if the point Z lies on the diagonal of the cube C d and c d = 1 + 4/d for a typical (random) point Z. For typical (random) points Z ∈ C d , the values of C d,Z,r are marginally smaller than for the points on the diagonal of C d having the same norm, but the difference is very small. In addition to the points on the diagonal, there are other special points: the points whose components are all zero except for one. For such points, the values of C d,Z,r are smaller than for typical points Z with the same norm, especially for small r. Such points, however, are of no value for us as they are not typical and we have never observed in simulations random points that come close to these truly exceptional points.

Simulation study
In Figures 1 -16 we demonstrate the accuracy of approximations (15), (17) and (19) for C d,Z,r in dimensions d = 10, 50 for the following locations of Z: There are figures of two types. In the figures of the first type, we plot C d,Z,r over a wide range of r ensuring that values of C d,Z,r lie in the whole range [0, 1]. In the figures of the second type, we plot C d,Z,r over a much smaller range of r with C d,Z,r lying in the range [0, ε] for some small positive ε such as ε = 0.015. For the purpose of using the approximations of Section 3, we need to assess the accuracy of all approximations for smaller values of C d,Z,r and hence the second type of plots are often more insightful. In Figures 1 -14, the solid black line depicts values of C d,Z,r computed via Monte Carlo methods, the blue dashed, the red dot-dashed and green long dashed lines display approximations (15), (17) and (19), respectively. In the case where Z is a random vector uniformly distributed on a sphere S d (0, v), the style of the figures of the second type is slightly changed to adapt for this choice of Z and provide more information for Z which do or do not belong to the cube C d . In Fig. 15 and Fig. 16, the thick dashed red lines correspond to random points Z ∈ S d (0, v) ∩ C d . The thick dot-dashed orange lines correspond to random points Z ∈ S d (0, v) such that Z ∈ C d . Approximations (15) and (17)      From the simulations that led to Figures 1 -16 we can make the following conclusions.
-The normal approximation (15) is quite satisfactory unless the value C d,Z,r is small.
-The accuracy of all approximations improves as d grows.
-The approximation (19) is very accurate even if the values C d,Z,r are very small.
-If d is large enough then the approximations (17) and (19) are practically identical and are extremely accurate.

Covering a cube by n balls
In this section, we consider the main problem of covering the cube C d = [−1, 1] d by the union of n balls B d (Z j , r) as formulated in Section 1.2. We will discuss different schemes of choosing the set of ball centers Z n = {Z 1 , . . . , Z n } for given d and n. The radius r will then be chosen to achieve the required probability of covering: C d (Z n , r) ≥ 1 − γ. Most of the schemes will involve one or several parameters which we will want to choose in an optimal way.

The main covering scheme
The following will be our main scheme for choosing Z n = {Z 1 , . . . , Z n }. We will formulate several other covering schemes and compare them with Scheme 1. The reasons why we have chosen Scheme 1 as the main scheme are the following.
-It is easier to theoretically investigate than all other non-trivial schemes.
-It includes, as a special case when δ = 1, the scheme which is very popular in practice of Monte-Carlo [8] and global random search [18,19] and is believed to be rather efficient (this is not true). -Numerical studies provided below show that Scheme 1 with optimal δ provides coverings which are rather efficient, especially for large d; see Section 3.5 for a discussion regarding this issue.

Theoretical investigation of Scheme 1
Let Z 1 , . . . , Z n be i.i.d. random vectors uniformly distributed in the cube C d (δ) with 0 < δ ≤ 1. Then, for given U = (u 1 , . . . , u d ) ∈ R d , where B d (Z n , r) is defined in (2). The main characteristic of interest C d (Z n , r), defined in (1), the proportion of the cube covered by the union of balls B d (Z n , r), is simply Continuing (20), note that where C (δ) d,U,r is defined by the formula (4). From (5) and (6) we have C (δ) d,U,r = C d,U/δ,r/δ where C d,U/δ,r/δ is the quantity defined by (3). This quantity can be approximated in a number of different ways as shown in Section 2. We will compare (15), the simplest of the approximations, with the approximation given in (19). Approximation (15) gives whereas approximation (19) provides with c d = 1 + 4/d and From (45), E U 2 = d/3 and var( U 2 ) = 4d/45. Moreover, if d is large enough then U 2 = d j=1 u 2 j is approximately normal. We shall simplify the expression (20) by using the approximation which is a good approximation for small values of t and moderate values of nt; this agrees with the ranges of d, n and r we are interested in.
We can combine the expressions (21) and (20) with approximations (23),(24) and (25) as well as with the normal approximation for the distribution of U 2 , to arrive at two final approximations for C d (Z n , r) that differ in complexity. If the original normal approximation of (23) is used then we obtain with If approximation (24) is used, we obtain: with

Simulation study for assessing accuracy of approximations (26) and (27)
In

Other schemes
In addition to Scheme 1, we have also considered the following schemes for choosing Z n = {Z 1 , . . . , Z n }.   The rationale behind the choice of these schemes is as follows. By studying Scheme 2, we test the importance of inclusion of 0 into Z n . We propositioned that if we included 0 into Z n , the optimal value of δ may increase for some of the schemes making them more efficient; this effect has not been detected. Scheme 3 with optimal δ is an obvious candidate for being the most efficient. Unlike all other schemes considered, Scheme 3 is only defined for the values of n of the form n = 2 k with k ≤ d. By using Scheme 4, we test the possibility of improving Scheme 1 by changing the distribution of points in the cube C d (δ). We have found that the effect of distribution is very strong and smaller values of α lead to more efficient covering schemes. By choosing α small enough, like α = 0.1, we can achieve the average efficiency of covering schemes very close to the efficiency of Scheme 3. Tables 1-3 contain results obtained for Scheme 4 with α = 0.5 and α = 1.5; if α = 1 then Scheme 4 becomes Scheme 1. From Section 6.4, we know that for constructing efficient designs we have to somehow restrict the norms of Z j 's. In Schemes 5 and 6, we are trying to do this in an alternative way to Schemes 1 and 4. Scheme 7 is a natural improvement of Scheme 1. As a particular case with δ = 1, it contains one of the best known low-discrepancy sequences and hence Scheme 7 with δ = 1 serves as the main benchmark with which we compare other schemes. For construction, we have used the R-implementation of the Sobol's sequences; it is based on [4]. For all the schemes excluding Scheme 3, the sequences Z n = {Z 1 , . . . , Z n } are nested so that Z n ⊂ Z m for all n < m; using the terminology of [6], these schemes provide on-line coverings of the cube. Note that for the chosen values of n, Scheme 7 also has some advantage over other schemes considered. Indeed, despite Sobol's sequences are nested, the values n of the form n = 2 k are special for the Sobol's sequences and for such values of n the Sobol's sequences possess extra uniformity properties that they do not possess for other values of n.

Numerical comparison of schemes
In Tables 1-3, for Schemes 1,2,4,5,6 we present the smallest values of r required to achieve an 0.9-coverage on average. For these schemes, the value inside the brackets shows the average value of δ required to obtain 0.9-coverage. For Schemes 3 and 7, we give the smallest value of r needed for a 0.9-coverage. For these two schemes, the value within the bracket corresponds to the (non random) value of δ with which we attain such a coverage.
In    Table 3: Values of r and δ (in brackets) to achieve 0.9 coverage for d = 50. From Tables 1-3 and Figures 23-30 we arrive at the following conclusions: the δ-effect is very important and getting much stronger as d increases; coverage of unadjusted low-discrepancy sequences is extremely low; properly δ-tuned deterministic Scheme 3, which uses fractional factorial designs of minimum abberation, provides excellent covering; randomized Scheme 4 with suitably chosen parameters of the Beta-distribution, also provides extremely high-quality covering (on average); for all schemes considered, the coverings with the optimal values of δ fully comply with the result of Section 6.4 describing the area of volume concentration in the cube C d .
That is, F d,Z,r , as a function of r, is the c.d.f. of the r.v.
Since the c.d.f. of a maximum of independent r.v. is the product of marginal c.d.f.'s, we obtain Two extreme particular cases of location of Z are: Assume now that we have the cube C d (δ) = [−δ, δ] d of volume (2δ) d and another cube C d (Z , r ) = {Y ∈ R d : Y − Z ∞ ≤ r } with a center at a point Z = (z 1 , . . . , z d ) . Denote the fraction of the cube C d (δ) covered by C d (Z , r ) by Then by changing the coordinates and the radius using (5) we get

Proportion of a cube covered by smaller cubes with random centers
Let us take the cube C d = [−1, 1] d and n smaller cubes Denote the fraction of the cube C d covered by C d (Z n , r) = ∪ n j=1 C d (Z j , r), the union of these cubes, by Our aim is to obtain a closed form expression for this quantity for arbitrary d, r and n in the case when Similarly to (21), For an integer k, set Then, using the binomial theorem, we have It is possible to evaluate (36) explicity. For k = 0, we clearly have I k = 1. For k ≥ 1, the integral I k takes different forms depending on the values of r and δ. For k ≥ 1, we have the following: -For r ≤ δ: -For 0 ≤ r − δ ≤ 1, r + δ ≥ 1: -For r − δ ≥ 1: In Figures 31-32, we depict values of C d,Zn,r (computed using (37)) as a function of δ for a number of choices of r. As in Section 3.5, we note that the δ-effect holds for the problem of coverage of the cube by smaller cubes.

Quantization
In this section, we briefly consider the following problem of quantization also known as the problem of facility location. Let X = (x 1 , . . . , x d ) be uniform on C d = [−1, 1] d and Z n = {Z 1 , . . . , Z n } be an n-point design. The mean square quantization error is θ n = θ(Z n ) = E X min i=1,...,n X − Z i 2 . In the case where Z 1 , . . . , Z n are i.i.d. uniform on C d (δ), we will derive a simple approximation for the expected value of Q(Z n ) in order to demonstrate the δ-effect. We shall also notice a strong correlation in design efficiency used for quantization and for (1 − γ)-covering as studied in Section 3. For deriving an approximation for the quantization mean squared error, we choose Scheme 1 of Section 3. That is, we assume that X = (x 1 , . . . , x d ) is uniform on C d = [−1, 1] d and Z 1 , . . . , Z n are i.i.d. uniform on a potentially smaller cube C d (δ) = [−δ, δ] d , 0 < δ ≤ 1. We are interested in finding δ = δ(n, d) such that the probability P{θ n ≤ r 2 } is maximal, where θ n = min i=1,...,n X − Z i 2 . From Lemma 1, we have for given X: We have θ n = min i=1,...,n ξ i , where ξ i are i.i.d.r.v. with the same distribution as θ 1 . Since θ 1 is a sum of d i.i.d.r.v., for large d we can assume that θ 1 is approximately normal; that is, θ 1 ∼ N (µ X,δ , σ 2 X,δ ). Under this assumption, ξ i ∼ N (µ X,δ , σ 2 X,δ ) and for the conditional expectation of θ n = min i=1,...,n ξ i we have where E n is the expectation of the maximum of n i.i.d. N (0, 1) r.v. For any X ∈ [0, 1] d , both µ X,δ and σ X,δ increase as δ → 0 and therefore the behaviour of Eθ n is not obvious when δ is small.
A rough estimator for E n is E n √ 2 log n and (not so rough) estimator for E X σ X,ε is E X σ X,ε E X σ 2 X,ε . This gives We suggest a simple modification to formula (38) to improve its accuracy for relatively small d. That is, we propose using whereF d,n (δ) = √ d(1 + δ 2 ) − 8 5 δ 1 + δ 2 /5 · 2 log n .
In Figures 33-34, we asses the accuracy of the approximation (39). In these two figures, the solid black line corresponds to Eθ n obtained via Monte Carlo methods and the dashed red line depicts the approximation.
As follows from results of [8,Ch.6], for efficient covering schemes the order of convergence of the covering radius to 0 as n → ∞ is n −1/d . Therefore, for the mean squared distance (which is the quantization error) we should expect the order n −2/d as n → ∞. Therefore, for sake of comparison of quantization errors θ n across n we renormalize this error from Eθ n to n 2/d Eθ n .
In Tables 4-6, we present the minimum value of n 2/d Eθ n for a selection of the schemes among those considered in Section 3. In these tables, the value within the brackets corresponds to the value of δ where the minimum of n 2/d Eθ n was obtained. For Scheme 3, typical behaviour of Eθ n across δ for a number and n and d is presented in Figures 35-38. We make the following two main conclusions from analyzing results of this numerical study:     Table 6: Minimum value of n 2/d Eθ n and δ (in brackets) across schemes and n for d = 50.
(a) the presence of a strong δ-effect, very similar to the effect observed in Section 3, and (b) for a given design Z n , there is a very strong correlation between the covering probability as studied in Section 3 and the normalized quantization error n 2/d Eθ(Z n ).
By comparing the values of δ in Tables 4-6 with Tables 1-3, we see a strong similarity between efficient quantization schemes and efficient covering schemes.  In this appendix, we briefly mention several facts, used in the main part of the paper, related to highdimensional cubes and balls. Many of these facts are somewhat counter-intuitive and often lead to creation of wrong heuristics in multivariate optimization and misunderstanding of the behaviour of even simple algorithms in high-dimensional spaces. For more details concerning the material of Sections 6.1-6.4, see [1].

Volume of the ball
The volume of the ball B d (r) = {x ∈ R d : x ≤ r} can be computed by the formula .
The volumes V d decrease very fast as d grows. For example, V 100 2.368 · 10 −40 . As d → ∞, 6.2 Radius of the ball of unit volume Define r d by vol(B d (r d )) = 1. Table 7 gives approximate values of r d .