The average sensitivity of an intersection of half spaces

Kane, Daniel

doi:10.1186/s40687-014-0013-6

The average sensitivity of an intersection of half spaces

Research
Open access
Published: 11 November 2014

Volume 1, article number 13, (2014)
Cite this article

Download PDF

You have full access to this open access article

Research in the Mathematical Sciences Aims and scope Submit manuscript

The average sensitivity of an intersection of half spaces

Download PDF

Daniel Kane¹

1242 Accesses
2 Citations
Explore all metrics

Abstract

We prove new bounds on the average sensitivity of the indicator function of an intersection of k halfspaces. In particular, we prove the optimal bound of $O (\sqrt{n log (k)})$ . This generalizes a result of Nazarov, who proved the analogous result in the Gaussian case, and improves upon a result of Harsha, Klivans and Meka. Furthermore, our result has implications for the runtime required to learn intersections of halfspaces.

AMS Subject Classification

Primary; 52C45

A Simpler Linear-Time Algorithm for Intersecting Two Convex Polyhedra in Three Dimensions

Article 25 April 2016

Maximum Matchings in Geometric Intersection Graphs

Article Open access 09 September 2023

Intersection Theorems for Triangles

Article 19 April 2021

Background

One of the most important measures of the complexity of a Boolean function $f : ℝ^{n} \to {\pm 1}$ is that of its average sensitivity, namely

𝔸 𝕊 (f) : = 𝔼_{x \sim_{u} {\pm 1}^{n}} [# {i : f (x) \neq f (x^{i})}]

where xⁱ above is x with the i^th coordinate flipped. The average sensitivity and related measures of noise sensitivity of a Boolean function have found several applications, perhaps most notably to the area of machine learning (see for example [1]). It has thus become important to understand how large the average sensitivity of functions in various classes can be.

Of particular interest is the study of the sensitivity of certain classes of algebraically defined functions. Gotsman and Linial [2] first studied the sensitivity of polynomial threshold functions (i.e. functions of the form f(x)=sgn(p(x)) for p a polynomial of bounded degree). They conjectured exact upper bounds on the sensitivity of polynomial threshold functions of limited degree, but were unable to prove them except in the case of linear threshold functions (when p is required to be degree 1). Since then, significant progress has been made towards proving this Conjecture. The first non-trivial bounds for large degree were proven in [3] by Diakonikolas et al. in 2010. Since then, progress has been rapid. In [4], the Gaussian analogue of the Gotsman-Linial Conjecture was proved, and in [5] the correct bound on average sensitivity was proved to within a polylogarithmic factor.

Another potential generalization of the degree-1 case of the Gotsman-Linial Conjecture (which bounds the sensitivity of the indicator function of a halfspace) would be to consider the sensitivity of the indictor function of the intersection of a bounded number of halfspaces. The Gaussian analogue of this question has already been studied. In particular, Nazarov has shown (see [6]) that the Gaussian surface area of an intersection of k halfspaces is at most $O (\sqrt{log k})$ . This suggests that the average sensitivity of such a function should be bounded by $O (\sqrt{n log k})$ . Although this bound has been believed for some time, attempts to prove it have been unsuccessful. Perhaps the closest attempt thus far was by Harsha, Klivans and Meka who show in [7] that an intersection of k sufficiently regular halfspaces has noise sensitivity with parameter ε at most log(k)^O(1)ε^1/6. In this paper, we prove that the bound of $O (\sqrt{n log (k)})$ is in face correct. In particular, we prove the following Theorem:

Theorem 1.

Let f be the indicator function of an intersection of k half spaces in n variables, then

𝔸 𝕊 (f) = O (\sqrt{n log (k)}) .

It should also be noted that Nazarov’s bound follows as a Corollary of Theorem 1, by replacing Gaussian random variables with averages of Bernoulli random variables. It is also not hard to show that this bound is tight up to constants. In particular:

Theorem 2.

If k≤2ⁿ, there exists a function f in n variables given by the intersection of at most k half spaces so that

𝔸 𝕊 (f) = Ω (\sqrt{n log (k)}) .

Our proof of Theorem 1 actually uses very little information about halfspaces. In particular, we use only the fact that linear threshold functions are monotonic in the following sense:

Definition 1.

We say that a function $f : {\pm 1}^{n} \to ℝ$ is unate if for all i, f is either increasing with respect to the i^th coordinate or decreasing with respect to the i^th coordinate.

We prove Theorem 1 by means of the following much more general statement:

Proposition 1.

Let f₁,…,f_k:{±1}ⁿ→{0,1}, be unate functions and let F:{±1}ⁿ→{0,1} be defined as $F (x) = \lor_{i = 1}^{k} f_{i} (x)$ . Then

𝔸 𝕊 (f) = O (\sqrt{n log (k)}) .

The application of Theorem 1 to machine learning is via a slightly different notion of noise sensitivity than that of the average sensitivity. In particular, we define the noise sensitivity as follows

Definition 2.

Let f:{±1}ⁿ→{0,1} be a Boolean function. For a parameter ε∈(0,1) we define the noise sensitivity of f with parameter ε to be

{ℕ 𝕊}_{ε} (f) : = Pr (f (x) \neq f (y))

where x and y are Bernoulli random variables where y is obtained from x by randomly and independently flipping the sign of each coordinate with probability ε.

Using this notation, we have that

Corollary 1.

If f:{±1}ⁿ→{0,1} is the indicator function of the intersection of k halfspaces, and ε∈(0,1) then

{ℕ 𝕊}_{ε} (f) = O (\sqrt{ε log (k)}) .

Remark 1.

This is false in general for intersections of unate functions, since if f is the tribes function on n variables (which is unate) then ${ℕ 𝕊}_{ε} (f) = Ω (1)$ so long as ε=Ω(log−1(n)).

Finally, using the L¹ polynomial regression algorithm of [1], we obtain the following:

Corollary 2.

The concept class of intersections of k halfspaces with respect to the uniform distribution on {±1}ⁿ is agnostically learnable with error o p t+ε in time $n^{O (log (k) ε^{- 2})}$ .

Proofs of the sensitivity bounds

The proof of Proposition 1 follows by a fairly natural generalization of one of the standard proofs for the case of a single unate function. In particular, if f:{±1}ⁿ→{0,1} is unate, we may assume without loss of generality that f is increasing in each coordinate. In such a case, it is easy to show that

𝔸 𝕊 (f) = 𝔼 [f (x) \sum_{i = 1}^{n} x_{i}] \leq 𝔼 [max (0, \sum_{i = 1}^{n} x_{i})] = O (\sqrt{n}) .

In fact, this technique can be extended to prove bounds on the sensitivity of unate functions with given expectation. In particular, Lemma 1 below provides an appropriate bound. Our proof of Proposition 1 turns out to be a relatively straightforward generalization of this technique. In particular, we show that by adding the f_i one at a time, the change in sensitivity is bounded by a similar function of the change in total expectation.

Lemma 1.

Let S:{±1}ⁿ→{0,1} and let $p = 𝔼 [S (x)]$ , then

𝔼 [S (x) \sum_{i = 1}^{n} x_{i}] = O (p \sqrt{n log (1 / p)}) .

Proof.

Note that:

\begin{matrix} 𝔼 [S (x) \sum_{i = 1}^{n} x_{i}] & \leq \int_{0}^{\infty} Pr (S (x) \sum_{i = 1}^{n} x_{i} > y) dy \\ \leq \int_{0}^{\infty} min (p, Pr (\sum_{i = 1}^{n} x_{i} > y)) dy \\ \leq \int_{0}^{\infty} min (p, exp (- Ω (y^{2} / n))) dy \\ \leq O (\int_{0}^{\infty} min (p, exp (- z^{2} / n) dz)) \\ \leq O (\int_{0}^{\sqrt{n log (1 / p)}} pdz + \int_{\sqrt{n log (1 / p)}}^{\infty} exp (- z^{2} / n) dz) \\ \leq O (p \sqrt{n log (1 / p)}) . \end{matrix}

We now prove Proposition 1.

Proof.

Let $F_{m} = \lor_{i = 1}^{m} f_{i} (x)$ . Let S_m(x)=F_m(x)−F_m−1(x). Let $p_{m} = 𝔼 [S_{m} (x)]$ . Our main goal will be to show that $𝔸 𝕊 (F_{m}) \leq 𝔸 𝕊 (F_{m - 1}) + O (p_{m} \sqrt{n log (p_{m})})$ , from which our result follows easily.

Consider $𝔸 𝕊 (F_{m}) - 𝔸 𝕊 (F_{m - 1})$ . We assume without loss of generality that f_m is increasing in every coordinate.

\begin{align} 𝔸 𝕊 (F_{m}) - 𝔸 𝕊 (F_{m - 1}) & = \sum_{i = 1}^{n} 𝔼 [|F_{m} (x) - F_{m} (x^{i})| - |F_{m - 1} (x) - F_{m - 1} (x^{i})|], \end{align}

where xⁱ denotes x with the i^th coordinate flipped. We make the following claim:

Claim. For each x,i,

\begin{align} |F_{m} (x) - F_{m} (x^{i})| & - |F_{m - 1} (x) - F_{m - 1} (x^{i})| \\ \leq x_{i} ((F_{m} (x) - F_{m} (x^{i})) - (F_{m - 1} (x) - F_{m - 1} (x^{i}))) . \end{align}

(1)

Proof. Our proof is based on considering two different cases.

Case 1: f_m(x)=f_m(xⁱ)=0

In this case, F_m(x)=F_m−1(x) and F_m(xⁱ)=F_m−1(xⁱ), and thus both sides of Equation 1 are 0.

Case 2: f_m(x)=1 or f_m(xⁱ)=1

Note that replacing x by xⁱ leaves both sizes of Equation 1 the same. We may therefore assume without loss of generality that x_i=1. Since f_m is increasing with respect to the i^th coordinate, f_m(x)≥f_m(xⁱ). Since at least one of them is 1, f_m(x)=1. Therefore, F_m(x)=1. Therefore, since

x_{i} (F_{m} (x) - F_{m} (x^{i})) \geq |F_{m} (x) - F_{m} (x^{i})|,

and

- x_{i} (F_{m - 1} (x) - F_{m - 1} (x^{i})) \geq - |F_{m - 1} (x) - F_{m - 1} (x^{i})|,

Equation 1 follows.

By the claim we have that

\begin{align} 𝔸 𝕊 (F_{m}) - 𝔸 𝕊 (F_{m - 1}) & \leq \sum_{i = 1}^{n} 𝔼 [x_{i} ((F_{m} (x) - F_{m} (x^{i})) - (F_{m - 1} (x) - F_{m - 1} (x^{i})))] \\ = \sum_{i = 1}^{n} 𝔼 [x_{i} (S_{m} (x) - S_{m} (x^{i}))] \\ = \sum_{i = 1}^{n} 𝔼 [x_{i} S_{m} (x)] - \sum_{i = 1}^{n} 𝔼 [(- y_{i}) S_{m} (y)] \\ = 2 𝔼 [S_{m} (x) \sum_{i = 1}^{n} x_{i}] \\ = O (p_{m} \sqrt{n log (1 / p_{m})}) . \end{align}

Where the on the third line above, we are letting y=xⁱ, and the last line is by Lemma 1.

Hence, we have that

\begin{align} 𝔸 𝕊 (F) & = \sum_{m = 1}^{k} 𝔸 𝕊 (F_{m}) - 𝔸 𝕊 (F_{m - 1}) \\ = O (\sqrt{n} \sum_{m = 1}^{k} p_{m} \sqrt{log (1 / p_{m})}) . \end{align}

Let $P = 𝔼 [F (x)] = \sum_{m = 1}^{k} p_{m}$ . By concavity of the function $x \sqrt{log (1 / x)}$ for x∈(0,1), we have that

𝔸 𝕊 (F) = O (\sqrt{n} P \sqrt{log (k / P)}) = O (\sqrt{n log (k)}) .

This completes our proof.

Theorem 1 follows from Proposition 1 upon noting that 1−f is a disjunction of k linear threshold functions, each of which is unate. Our proof of Theorem 1 shows that the bound can be tight only if a large number of the halfspaces cut off an incremental volume of roughly 1/k. It turns out that this bound can be achieved when we take a random collection of halfspaces with such volumes. Before we begin to prove Theorem 2, we need the following Lemma:

Lemma 2.

For an integer n and 1/2>ε>2⁻ⁿ there exists a linear threshold function f:{±1}ⁿ→{0,1} so that

𝔼_{x} [f (x)] \geq ε,

and

𝔸 𝕊 (f) = Ω (𝔼_{x} [f (x)] \sqrt{n log (1 / ε)}) .

Proof.

This is easily seen to be the case if we let f(x) be the indicator function of $\sum_{i = 1}^{n} x_{i} > t$ for t as large as possible so that this event takes place with probability at least ε.

Proof.

We note that it suffices to show that there is such as f given as the indicator function of a union of at most k half-spaces, as 1−f will have the same average sensitivity and will be the indicator function of an intersection. Let ε=1/k, and let f be the function given to us in Lemma 2. We note that if $𝔼 [f (x)] > 1 / 4$ , then f is sufficient and we are done. Otherwise let $m = ⌊ 1 / (4 𝔼 [f (x)]) ⌋ \leq k$ . For s∈{±1}ⁿ let f_s(x)=f(s₁x₁,…,s_nx_n). We note for each s that f_s(x) is a linear threshold function with $𝔼 [f_{s} (x)] = 𝔼 [f (x)]$ and $𝔸 𝕊 (f_{s}) = 𝔸 𝕊 (f)$ .

Let

F (x) = \lor_{i = 1}^{m} f_{s_{i}} (x)

for s_i independent random elements of {±1}ⁿ. We note that F(x) is always the indicator of a union of at most k half-spaces, but we also claim that

𝔼_{s_{i}} [𝔸 𝕊 (F)] = Ω (\sqrt{n log (k)}) .

This would imply our result for appropriately chosen values of the s_i.

We note that $𝔸 𝕊 (F)$ is 2¹⁻ⁿ times the number of pairs of adjacent elements x,y of the hypercube so that F(x)=1,F(y)=0. This in turn is at least 2¹⁻ⁿ times the sum over 1≤i≤m of the number of pairs of adjacent elements of the hypercube x,y so that $f_{s_{i}} (x) = 1, f_{s_{i}} (y) = 0$ and so that $f_{s_{j}} (x) = f_{s_{j}} (y) = 0$ for all j≠i.

On the other hand, for each i, 2¹⁻ⁿ times the number of pairs of adjacent elements x,y so that $f_{s_{i}} (x) = 1, f_{s_{i}} (y) = 0$ is

𝔸 𝕊 (f_{s_{i}}) = 𝔸 𝕊 (f) = Ω (𝔼 [f (x)] \sqrt{n log (k)}) = Ω (m^{- 1} \sqrt{n log (k)}) .

For each of these pairs, we consider the probability over the choice of s_j that $f_{s_{j}} (x) = 1$ or $f_{s_{j}} (y) = 1$ for some j≠i. We note that for each fixed x and j that

{Pr}_{s_{j}} (f_{s_{j}} (x) = 1) = 𝔼_{s_{j}} [f_{s_{j}} (x)] = 𝔼_{s_{j}} [f_{x} (s_{j})] = 𝔼_{z} [f (z)] \leq \frac{1}{4 m} .

Thus, by the union bound, the probability that either $f_{s_{j}} (x) = 1$ or $f_{s_{j}} (y) = 1$ for some j≠i is at most 1/2. Therefore, the expected number of adjacent pairs x,y with $f_{s_{i}} (x) = 1$ , $f_{s_{i}} (y) = 0$ and $f_{s_{j}} (x) = f_{s_{j}} (y) = 0$ for all j≠i is at least $𝔸 𝕊 (f_{s_{j}}) / 2$ . Therefore,

𝔼_{s_{i}} [𝔸 𝕊 (F)] \geq \sum_{i = 1}^{m} 𝔸 𝕊 (f) / 2 = mΩ (m^{- 1} \sqrt{n log (k)}) = Ω (\sqrt{n log (k)}),

as desired. This completes our proof.

Learning theory application

The proofs of Corollaries 1 and 2 are by what are now fairly standard techniques, but are included here for completeness. The proof of Corollary 1 is by a technique of Diakonikolas et al. in [8] for bounding the noise sensitivity in terms of the average sensitivity.

Proof.

As the noise sensitivity is an increasing function of ε for ε<1/2, we may round ε down to 1/⌈ε⁻¹⌉, and thus it suffices to consider ε=1/m for some integer m. We note that the pair of random variables x, y used to define the noise sensitivity with parameter ε can be generated in the following way:

1.
Randomly divide the n coordinates into m bins.
2.
Randomly assign each coordinate a value in {±1} to obtain z.
3.
For each bin randomly pick b _i∈{±1}. Obtain x from z by multiplying all coordinates in the i ^th bin by b _i for each i.
4.
Obtain y from x by flipping the sign of all coordinates in a randomly chosen bin.

We note that this produces the same distribution on x and y since x is clearly a uniform element of {±1}ⁿ and the i^th coordinate of y differs from the corresponding coordinate of x if and only if i lies in the bin selected in step 4. This happens independently and with probability 1/m for each coordinate.

Next let f be the indicator function of an intersection of at most k halfspaces. Note that after the bins and z are picked in steps 1 and 2 above that f(x) is given by g(b) where g is the indicator function of an intersection of at most k halfspaces in m variables. In the same notation, f(y)=g(b^′) where b^′ is obtained from b by flipping the sign of a single random coordinate. Thus, by definition, $Pr (g (b) \neq g (b^{'})) = \frac{1}{m} 𝔸 𝕊 (g)$ . Hence,

{ℕ 𝕊}_{ε} (f) = 𝔼_{g} [\frac{𝔸 𝕊 (g)}{m}] \leq \frac{O (\sqrt{log (k) m})}{m} = \sqrt{\frac{log (k)}{m}} = \sqrt{ε log (k)} .

This completes our proof.

Corollary 2 will now follow by using this bound to bound the weight of the higher degree Fourier coefficients of such an f and then using the L¹ polynomial regression algorithm of [1].

Proof.

Let f be the indicator function of an intersection of k halfspaces. Let f have Fourier transform given by

f = \sum_{S \subset [n]} χ_{S} \hat{f} (S) .

It is well known that for ρ∈(0,1) that

{ℕ 𝕊}_{ρ} (f) = 2 \sum_{S \subset [n]} (1 - {(1 - 2 ρ)}^{| S |}) {|\hat{f} (S)|}^{2} .

Therefore, we have that

{ℕ 𝕊}_{ρ} (f) ≫ \sum_{| S | > 1 / ρ} {|\hat{f} (S)|}^{2} .

By Corollary 1, this tells us that

\sum_{| S | > 1 / ρ} {|\hat{f} (S)|}^{2} = O (\sqrt{ρ log (k)}) .

Setting ρ=ε²/(C log(k)) for sufficiently large values of C yields

\sum_{| S | > C log (k) ε^{- 2}} {|\hat{f} (S)|}^{2} < ε.

Our claim now follows from [1] Remark 4.

References

Kalai, AT, Klivans, AR, Mansour, Y, Servedio, RA: Agnostically Learning Halfspaces. In: Proceedings of the 46th Foundations of Computer Science (2005).
Google Scholar
Gotsman C, Linial N: Spectral properties of threshold functions. Combinatorica 1994, 14: 35–50. 10.1007/BF01305949
Article MATH MathSciNet Google Scholar
Diakonikolas, I, Raghavendra, P, Servedio, RA, Tan, LY: Bounding the average sensitivity and noise sensitivity of polynomial threshold functions. In: Proceedings of the 42nd ACM Symposium on Theory of Computing. ACM (2010).
Google Scholar
Kane, DM: The Gaussian surface area and noise sensitivity of degree-d polynomial threshold functions. In: Proceedings of the 25th Annual Conference on Computational Complexity, pp. 205–210 (2010).
Google Scholar
Kane, DM: The Correct, Exponent for the Gotsman-Linial Conjecture. In: Proceedings of the 28th Annual Conference on Computational Complexity (2013).
Google Scholar
Adam, R, Klivans, RO, Servedio, RA: Learning geometric concepts via gaussian surface area. In: Proceedings of the 49th Foundations of Computer Science, pp. 541–550 (2008).
Google Scholar
Harsha, P, Kilvans, AR, Meka, R: An invariance principle for polytopes. In: Proceedings of the 42nd ACM Symposium on Theory of Computing (2010).
Google Scholar
Diakonikolas, I, Raghavendra, P, Tan, LY: Average sensitivity and noise sensitivity of polynomial threshold functions. Manuscript available at ., [http://arxiv.org/abs/0909.5011]

Download references

Acknowledgements

This work was done with the support of an NSF postdoctoral research fellowship.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, 9500 Gilman Drive #0404, La Jolla, 92093-0404, CA, USA
Daniel Kane

Authors

Daniel Kane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Kane.

Additional information

Competing interests

The author has no competing interests.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Kane, D. The average sensitivity of an intersection of half spaces. Mathematical Sciences 1, 13 (2014). https://doi.org/10.1186/s40687-014-0013-6

Download citation

Received: 12 February 2014
Accepted: 10 September 2014
Published: 11 November 2014
DOI: https://doi.org/10.1186/s40687-014-0013-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The average sensitivity of an intersection of half spaces

Abstract