## Abstract

We refine the bound on the packing number, originally shown by Haussler, for shallow geometric set systems. Specifically, let \(\mathcal {V}\) be a finite set system defined over an *n*-point set *X*; we view \(\mathcal {V}\) as a set of indicator vectors over the *n*-dimensional unit cube. A \(\delta \)-separated set of \(\mathcal {V}\) is a subcollection \(\mathcal {W}\), s.t. the Hamming distance between each pair \(\mathbf{u}, \mathbf{v}\in \mathcal {W}\) is greater than \(\delta \), where \(\delta > 0\) is an integer parameter. The \(\delta \)-packing number is then defined as the cardinality of a largest \(\delta \)-separated subcollection of \(\mathcal {V}\). Haussler showed an asymptotically tight bound of \(\Theta ((n/\delta )^d)\) on the \(\delta \)-packing number if \(\mathcal {V}\) has VC-dimension (or *primal shatter dimension*) *d*. We refine this bound for the scenario where, for any subset, \(X' \subseteq X\) of size \(m \le n\) and for any parameter \(1 \le k \le m\), the number of vectors of length at most *k* in the restriction of \(\mathcal {V}\) to \(X'\) is only \(O(m^{d_1} k^{d-d_1})\), for a fixed integer \(d > 0\) and a real parameter \(1 \le d_1 \le d\) (this generalizes the standard notion of *bounded primal shatter dimension* when \(d_1 = d\)). In this case when \(\mathcal {V}\) is “*k*-shallow” (all vector lengths are at most *k*), we show that its \(\delta \)-packing number is \(O(n^{d_1} k^{d-d_1}/\delta ^d)\), matching Haussler’s bound for the special cases where \(d_1=d\) or \(k=n\). We present two proofs, the first is an extension of Haussler’s approach, and the second extends the proof of Chazelle, originally presented as a simplification for Haussler’s proof.

### Similar content being viewed by others

## Notes

The symmetric difference distance between two sets

*A*,*B*is the cardinality of their symmetric difference.We ignore the cases where \(d_1 < 1\), as it does not seem to appear in natural set systems—see below.

We note that although in the original analysis for this bound

*d*is the VC-dimension, this assumption can be replaced by having just a primal shatter dimension*d*; see, e.g., [14] for the details of the analysis.We note, however, that the original analysis of Haussler [12] does not rely on the primal shatter dimension, and the bound on \({{\mathrm{\mathbf {Exp}}}}_{I}[ |\mathcal {V}_{|_I}|]\) is just \(O(m^{d_0})\) due to the Sauer–Shelah Lemma.

In this particular step we use a different machinery than that of Haussler [12]; see the proof of Lemma 5 and our remark after Corollary 6. Therefore, \(|I_1| = m_1\), rather than \(m_1-1\). Furthermore, the constant of proportionality in the bound on \(m_1\) depends just on the primal shatter dimension

*d*instead of the VC-dimension \(d_0\) as in (3).We note that in the original formulation in [25], one needs to have a set of

*independent*random variables \(\hat{X}_{i}\), \(i \in \{1, \ldots , \Vert \mathbf{v}\Vert \}\) with \(\hat{X} = \sum _{i=1}^{\Vert \mathbf{v}\Vert } \hat{X}_{i}\), such that \({{\mathrm{\mathbf {Exp}}}}[X] \le {{\mathrm{\mathbf {Exp}}}}[\hat{X}]\). In the scenario of our problem \(\hat{X}_i\) is taken to be a Bernoulli indicator random variable, which takes value one with probability \((m_j - 1)/n\), in which case \({{\mathrm{\mathbf {Exp}}}}[X] = {{\mathrm{\mathbf {Exp}}}}[\hat{X} ] = \Vert \mathbf{v}\Vert \cdot (m_j - 1)/n\).We observe that \(2 \le \log ^{*}{(n/\delta )} - \log ^*{(d_0+1)} \le \log ^{*}{(n/\delta )}\), due to our assumption that \(\delta < n/2^{(d_0+1)}\), and the fact that \(d_0 \ge 1\).

For any \(p\in (0,1)\), the binomial distribution with parameter

*p*on a finite set*X*, under the conditions that only one element is selected and that it does not belong to a fixed subset \(I'\subset X\), is just the uniform distribution on \(X\setminus I'\).In [28] it is also required that the projection of each function onto the plane \(x_d = 0\) has a constant description complexity.

For the time being,

*P*is an arbitrary distribution, but later on (Lemma 19) it is taken to be the uniform distribution in the obvious way, where each vector in \(\mathcal {V}\) is equally likely to be chosen. This distribution, however, may not remain uniform after the projection of \(\mathcal {V}\) onto a proper subsequence \(I' = (i_1, \ldots , i_m)\) of \(m < n\) indices, as several vectors in \(\mathcal {V}\) may be projected onto the same vector in \(\mathcal {V}_{|_{I'}}\).We cannot guarantee such a relation when the VC-dimension \(d_0\) is replaced by the primal shatter dimension

*d*, and therefore we proceed with the analysis using this ratio.

## References

Alon, N., Spencer, J.: The Probabilistic Method, 3rd edn. Wiley, New York (2008)

Auger, A., Doerr, B.: Theory of Randomized Search Heuristics: Foundations and Recent Developments. World Scientific, Singapore (2011)

Bshouty, N.H., Li, Y., Long, P.M.: Using the doubling dimension to analyze the generalization of learning algorithms. J. Comput. Syst. Sci.

**75**(6), 323–335 (2009)Chazelle, B.: A note on Haussler’s packing lemma (1992) (unpublished manuscript)

Chazelle, B., Welzl, E.: Quasi-optimal range searching in spaces of finite VC-dimension. Discrete Comput. Geom.

**4**(5), 467–489 (1989)Clarkson, K.L., Shor, P.W.: Applications of random sampling in computational geometry, II. Discrete Comput. Geom.

**4**(5), 387–421 (1989)Dudley, R.M.: Central limit theorems for empirical measures. Ann. Probab.

**6**(6), 899–1049 (1978)Dutta, K., Ezra, E., Ghosh. A.: Two proofs for shallow packings. In: Proceedings of the 31st International Symposium on Computational Geometry, pp. 96–110 (2015)

Ezra, E.: A size-sensitive discrepancy bound for set systems of bounded primal shatter dimension. SIAM J. Comput. 45(1):84–101 (2016) (A preliminary version appeard in Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1378–1388. SIAM (2014))

Gottlieb, L.-A., Kontorovich, A., Mossel, E.: VC bounds on the cardinality of nearly orthogonal function classes. Discrete Math.

**312**(10), 1766–1775 (2012)Haussler, D.: Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inf. Comput.

**100**(1), 78–150 (1992)Haussler, D.: Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik–Chervonenkis dimension. J. Comb. Theory, Ser. A

**69**(2), 217–232 (1995)Haussler, D., Littlestone, N., Warmuth, M.K.: Predicting 0, 1-functions on randomly drawn points. Inf. Comput.

**115**(2), 248–292 (1994)Har-Peled, S.: Geometric Approximation Algorithms. American Mathematical Society, Boston (2011)

Har-Peled, S., Sharir, M.: Relative \((\varepsilon, \rho )\)-approximations in geometry. Discrete Comput. Geom.

**45**(3), 462–496 (2011)Haussler, D., Welzl, E.: Epsilon-nets and simplex range queries. Discrete Comput. Geom.

**2**, 127–151 (1987)Li, Y., Long, P.M., Srinivasan, A.: Improved bounds on the sample complexity of learning. J. Comput. Syst. Sci.

**62**(3), 516–527 (2001)Lovett, S., Meka, R.: Constructive discrepancy minimization by walking on the edges. In: Proceedings of the IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS ’12) pp. 61–67. IEEE Computer Society, Washington, DC (2012)

Matoušek, J.: Reporting points in halfspaces. Comput. Geom.

**2**(3), 169–186 (1992)Matousek, J.: Tight upper bounds for the discrepancy of half-spaces. Discrete Comput. Geom.

**13**, 593–601 (1995)Matousek, J.: Geometric Discrepancy: An Illustrated Guide. Algorithms and Combinatorics. Springer, Berlin (1999)

Matousek, J.: Lectures on Discrete Geometry. Springer, Secaucus (2002)

Mulzer, W.: Chernoff Bounds, Personal note. http://page.mi.fu-berlin.de/mulzer/notes/misc/chernoff.pdf

Mustafa, N.H.: A simple proof of the shallow packing lemma. Discrete Comput. Geom.

**55**(3), 739–743 (2016)Panconesi, A., Srinivasan, A.: Randomized distributed edge coloring via an extension of the Chernoff–Hoeffding bounds. SIAM J. Comput.

**26**, 350–368 (1997)Pollard, D.: Convergence of Stochastic Processes. Springer, New York (1984)

Sauer, N.: On the density of families of sets. J. Comb. Theory, Ser. A

**13**(1), 145–147 (1972)Sharir, M., Agarwal, P.K.: Davenport-Schinzel Sequences and Their Geometric Applications. Cambridge University Press, New York (1995)

Shelah, S.: A combinatorial problem, stability and order for models and theories in infinitary languages. Pac. J. Math.

**41**, 247–261 (1972)Vapnik, V., Chervonenkis, A.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Prob. Appl.

**16**(2), 264–280 (1971)Welzl, E.: On spanning trees with low crossing numbers. In: Data Structures and Efficient Algorithms, Final Report on the DFG Special Joint Initiative, pp. 233–249. Springer, London (1992)

## Acknowledgments

We authors would like to thank two anonymous referees for their useful comments. The second author wishes to thank Boris Aronov, Sariel Har-Peled, Aryeh Kontorovich, and Wolfgang Mulzer for useful discussions and suggestions. Last but not least, the second author thanks Ramon Van Handel, for various discussions and for spotting an error in an earlier version of this paper. Work on this paper by Kunal Dutta and Arijit Ghosh has been supported by the Indo-German Max-Planck Center for Computer Science (IMPECS). Work on this paper by Esther Ezra has been supported by NSF Grants CCF-11-17336, CCF-12-16689, and NSF CAREER CCF-15-53354. A preliminary version of this paper appeared in *Proc. Sympos. Computational Geometry*, 2015, pp. 96–110 [8]

## Author information

### Authors and Affiliations

### Corresponding author

## Additional information

Editor in Charge: János Pach

Kunal Dutta is currently supported by the European Research Council Advanced Grant 339025 GUDHI (Geometric Understanding in Higher Dimensions). Arijit Ghosh is currently supported by Ramanujan Fellowship, 2016.

## Appendices

### Appendix 1: Overview of Haussler’s Approach

Let \(\mathcal {V}\subseteq \{0,1\}^n\) be a collection of indicator vectors of primal shatter dimension *d*. We denote its VC-dimension by \(d_0\); as discussed in Section 1
\(d_0 = O(d\log {d})\).

We first form a probability distribution *P* over \(\mathcal {V}\), implying that \(\mathcal {V}\) can be viewed^{Footnote 10} as an *n*-dimensional random variable taking values in \(\{0,1\}^n\). Thus its components \(\mathcal {V}_i\), \(i=1, \ldots , n\), represent *n* correlated indicator random variables (Bernoulli random variables), and each of their values is determined by randomly selecting a vector \(\mathbf{v}\in \mathcal {V}\), and letting \(\mathcal {V}_i\) be the *i*th component of \(\mathbf{v}\). The variance of a Bernoulli random variable *B* is known to be \({{\mathrm{\mathbf {Prob}}}}[B=1] {{\mathrm{\mathbf {Prob}}}}[B=0]\), and then, for a sequence \(B_1, \ldots , B_m\) of Bernoulli random variables, the *conditional variance* of \(B_m\) given \(B_1, \ldots , B_{m-1}\) is defined as

where \({{\mathrm{\mathbf {Prob}}}}(\mathbf{v}) = {{\mathrm{\mathbf {Prob}}}}(B_1 = \mathbf{v}_1, B_2 = \mathbf{v}_2, \ldots , B_{m-1} = \mathbf{v}_{m-1})\), and \({{\mathrm{\mathbf {Prob}}}}(B_m = 1 | \mathbf{v}) = {{\mathrm{\mathbf {Prob}}}}(B_m = 1 | B_1 = \mathbf{v}_1, B_2 = \mathbf{v}_2, \ldots , B_{m-1} = \mathbf{v}_{m-1})\).

A key property in the analysis of Haussler [12] lies in the density of a *unit distance graph*
\(G =(\mathcal {V},E)\), defined over \(\mathcal {V}\), whose edges correspond to all pairs \(\mathbf{u}, \mathbf{v}\in \mathcal {V}\), whose symmetric difference distance is (precisely) 1. In other words, \(\mathbf{u}\), \(\mathbf{v}\) appear as neighbors on the unit cube \(\{0,1\}^n\). It has been shown by Haussler et al. [13] that the density of *G* is bounded by the VC-dimension of \(\mathcal {V}\), that is, \(|E|/|\mathcal {V}| \le d_0\); see also [12] for an alternative proof using the technique of “shifting”.^{Footnote 11} Then, this low density property is exploited in order to show that once we have chosen \((n-1)\) coordinates of the random variable \(\mathcal {V}\), the variance in the choice of the remaining coordinate is relatively small. That is:

### Lemma 18

([12]) For any distribution *P* on \(\mathcal {V}\),

As observed in [12], Lemma 18 continues to hold on any restriction of \(\mathcal {V}\) to a sequence \(I' = \{i_i,\ldots , i_m\}\) of \(m \le n\) indices. Indeed, when projecting \(\mathcal {V}\) onto \(I'\) the VC-dimension in the resulting set system remains \(d_0\). Furthermore, the conditional variance is now defined w.r.t. the induced probability distribution on \(\mathcal {V}_{|_{I'}}\) in the obvious way, where the probability to obtain a sequence of *m* values corresponds to an appropriate marginal distribution, that is,

With this observation, we can rewrite the inequality stated in Lemma 18 as

If \(I'\) is a sequence chosen uniformly at random (over all such *m*-tuples), then when averaging over all choices of \(I'\) we clearly obtain:

or

by linearity of expectation. In fact, by symmetry of the random variables \(\mathcal {V}_{i_{j}}\) (recall that \(I'\) is a random *m*-tuple) each of the summands in the above inequality has an equal contribution, and thus, in particular (recall once again that the expectation is taken over all choices of \(I'= \{i_i,\ldots , i_m\}\)):

where we write \({{\mathrm{\mathbf {Exp}}}}_{I'}[\cdot ]\) to emphasize the fact that the expectation is taken over all choices of \(I'\). The above bound is now integrated with the next key property:

### Lemma 19

[12] Let \(\mathcal {V}\) be \(\delta \)-separated subset of \(\{0,1\}^n\), for some \(1 \le \delta \le n\) integer, and form a uniform distribution *P* on \(\mathcal {V}\). Let \(I = (i_1, \ldots , i_{m-1})\) be a sequence of \(m-1\) distinct indices between 1 and *n*, where *m* is any integer between 1 and *n*. Suppose now that another index \(i_m\) is drawn uniformly at random from the remaining \(n-m+1\) indices. Then

where the conditional variance is taken w.r.t. the distribution *P*, and the expectation is taken w.r.t. the random choice of \(i_m\).

We now observe that when the entire sequence \(I' = (i_1, \ldots , i_m)\) is chosen uniformly at random, then the bound in Lemma 19 continues to hold when averaging on the entire sequence \(I'\) (rather than just on \(i_m\)), that is, we have:

Note that under this formulation, \(|\mathcal {V}_{|_{I}}|\) (the number of sets in the projection of \(\mathcal {V}\) onto *I*) is a random variable that depends on the choice of \(I = (i_1, \ldots , i_{m-1})\), in particular, since it does not depend on the choice of \(i_m\), we have \({{\mathrm{\mathbf {Exp}}}}_{I'} [ |\mathcal {V}_{|_{I}}| ] = {{\mathrm{\mathbf {Exp}}}}_{I} [ |\mathcal {V}_{|_{I}}| ]\).

The analysis of Haussler [12] then proceeds as follows. We assume that \(\mathcal {V}\) is \(\delta \)-separated as in Lemma 19 (then a bound on \(|\mathcal {V}|\) is the actual bound on the packing number), and then choose

indices \(i_1, \ldots , i_m\) uniformly at random without replacements from [*n*] (without loss of generality, we can assume that \(\delta \ge 3\) as, otherwise, we set the bound on the packing to be \(O(n^{d_1} k^{d-d_1})\), as asserted by the \((d,d_1)\) Clarkson–Shor property of \(\mathcal {V}\). Moreover, we can assume, without loss of generality, \(n \ge d_0, \delta \), and thus we have \(m \le n\)). Put \(I' = \{i_1, \ldots , i_m\}\), \(I = I' \setminus \{i_m\}\). Then the analysis in [12] combines the two bounds in Inequalities (13) and (14) in order to derive an upper bound on \(|\mathcal {V}|\), from which the bound in the Packing Lemma (Theorem 3) is obtained. Specifically, using simple algebraic manipulations, we obtain:

It has been shown in [12] that due to our choice of *m*, we have:

from which we obtain Inequality (1), as asserted. This completes the description of our first extension of Haussler’s analysis, as described in Sect. 2.1.

In order to apply our second extension, we observe that one only needs to assume \(m \le n\) when obtaining Inequalities (13) and (14). In addition, the choice of *m* in Inequality (15) can be made slightly larger (but still smaller than *n*), since the term \(\frac{2d_0(n-m+1)}{m\delta }\) in Inequality (16) is a decreasing function of *m*, as can easily be verified. Recall that in our analysis we replace *m* by \(m_j := m \log ^{(j)}{(n/\delta )}\), where \(2 \le j \le \log ^{*}{(n/\delta )}\), in which case we still obtain \(|\mathcal {V}| \le (d_0 + 1){{\mathrm{\mathbf {Exp}}}}_{I_j}[ |\mathcal {V}_{|_{I_j}}|]\), where \(I_j\) is a set of \(m_j - 1\) indices chosen uniformly at random without replacements from [*n*].

### Appendix 2: Proof of Theorem 17

We use the framework of Ezra [9], together with the Shallow Packing Lemma (Theorem 4), to show Theorem 17. To this end, we use the machinery and notation in [9] and only revise the proof of [9, Theorem 3.5], where we now plug into the analysis the new bound in (Theorem 4). In particular, we use the same notation and definitions for \(\mathcal {F}_j^{i}\) (representing an appropriate collection of “canonical sets”) and \(\Delta _j^{i}\) (representing a desired discrepancy bound on the canonical sets), and then bound an appropriate entropy function in order to apply the mechanism of Lovett and Meka [18]. We do not repeat these details here, and instead refer the reader for the notation and machinery presented in [9]. In the sequel we only show the derivation of the bound for \(\Delta _j^{i}\), then the derivation of the discrepancy bound for the original sets \(S \in \Sigma \) is straightforward by the analysis in [9], in which *S* is represented by the disjoint union of the symmetric difference of pairs of canonical sets.

Assume without loss of generality that \(\log {n}\) is an integer, and let \(k:= \log n\). By Theorem 4 and the analysis in [9], we have that for each \(i=1, \ldots , k\) and \(j=i-1, \ldots , k\),

where \(C > 0\) is an appropriate constant as stated in Theorem 4. By the construction in [9], each set \(F_{j}^{i} \in \mathcal {F}_j^{i}\) satisfies \(|F_j^{i}| = O(n/2^{i-1})\), for a fixed index *i*, and any \(j=i-1, \ldots , k\).

Our discrepancy parameter \(\Delta _{j}^{i}\) is chosen as follows:

where

for an appropriate constant \(B > 5 + \log {C}\), and for a sufficiently large constant of proportionality \(A > 0\), whose choice depends on *B*, and will be determined shortly (note that all the three constants *A*, *B*, and *C* depend on *d*).

In order to apply the constructive discrepancy minimization technique of Lovett and Meka [18], we need to show:

### Lemma 20

Put \(s_j := n/2^{j-1}\). The choice in (17), for \(A > 0\) sufficiently large (whose choice depends on *C* and thus on *d*), satisfies

We next proceed almost verbatim as in [9].

We first note that at \(j=j_0\) the above exponent becomes a constant, whereas the bound \( C \cdot \frac{2^{jd}}{2^{(d - d_1)(i-1)}}\) (representing an appropriate packing number) becomes roughly \(n/\log {n}\) (for a fixed index *i*). Indeed, applying our choice in (17), we have

which is \(\exp \!{\big (-\frac{A^2}{16 \cdot 2^{B+1}}\big )}\) at \(j =j_0 = (1/d) \log {n} + (1 - d_1/d)(i-1) - (1/d)\log \log {n} - B\). Concerning the bound \(C \cdot \frac{2^{jd}}{2^{(d - d_1)(i-1)} }\), at \(j=j_0\) we obtain:

as asserted.

We now fix an index *i*, split the summation into the two parts \(j \ge j_0\) and \(i-1 \le j < j_0\), and then bound each part in turn. In the first part, the exponent will “take over” the summation in the sense that it decreases superexponentially, making the other factors (with \(j > j_0\)) insignificant, and in the second part, the packing size will decrease geometrically. Thus the “peak” of this summation is obtained at \(j = j_0\), and is decreasing as we go beyond or below *j*.

For the first part, put \(j := j_0 + l\), for an integer \(l \ge 0\), and then

where the logarithmic factor is now eliminated due to the summation over *i*. The exponents in the above sum decrease superexponentially. Choosing *A* sufficiently large (say, \(A > 2^{6 + (B+1) + \log {d}}\)) and having \(B > 5 + \log {C}\) as above, we can guarantee that the latter sum is strictly smaller than *n* / 32.

When \(j < j_0\), put \(j := j_0 - l\), \(l > 0\) as above. We now obtain, by just bounding the exponent from above by 1, and using similar considerations as above:

Once again, our choice for *B* guarantees that the above (geometrically decreasing) sum is strictly smaller than *n* / 32. Thus the entire summation is bounded by *n* / 16, as asserted.

This completes the proof of Theorem 17.

## Rights and permissions

## About this article

### Cite this article

Dutta, K., Ezra, E. & Ghosh, A. Two Proofs for Shallow Packings.
*Discrete Comput Geom* **56**, 910–939 (2016). https://doi.org/10.1007/s00454-016-9824-0

Received:

Revised:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s00454-016-9824-0

### Keywords

- Packing lemma and shallow packing lemma
- Set systems of finite VC–dimension
- Primal shatter function
- Clarkson–Shor property