Improved Approximation of Linear Threshold Functions

Abstract

We prove two main results on how arbitrary linear threshold functions \({f(x) = {\rm sign}(w \cdot x - \theta)}\) over the n-dimensional Boolean hypercube can be approximated by simple threshold functions. Our first result shows that every n-variable threshold function f is \({\epsilon}\) -close to a threshold function depending only on \({{\rm Inf}(f)^2 \cdot {\rm poly}(1/\epsilon)}\) many variables, where \({{\rm Inf}(f)}\) denotes the total influence or average sensitivity of f. This is an exponential sharpening of Friedgut’s well-known theorem (Friedgut in Combinatorica 18(1):474–483, 1998), which states that every Boolean function f is \({\epsilon}\)-close to a function depending only on \({2^{O({\rm Inf}(f)/\epsilon)}}\) many variables, for the case of threshold functions. We complement this upper bound by showing that \({\Omega({\rm Inf}(f)^2 + 1/\epsilon^2)}\) many variables are required for \({\epsilon}\)-approximating threshold functions. Our second result is a proof that every n-variable threshold function is \({\epsilon}\)-close to a threshold function with integer weights at most \({{\rm poly}(n) \cdot 2^{\tilde{O}(1/\epsilon^{2/3})}.}\) This is an improvement, in the dependence on the error parameter \({\epsilon}\), on an earlier result of Servedio (Comput Complex 16(2):180–209, 2007) which gave a \({{\rm poly}(n) \cdot 2^{\tilde{O}(1/\epsilon^{2})}}\) bound. Our improvement is obtained via a new proof technique that uses strong anti-concentration bounds from probability theory. The new technique also gives a simple and modular proof of the original result of Servedio (Comput Complex 16(2):180–209, 2007) and extends to give low-weight approximators for threshold functions under a range of probability distributions other than the uniform distribution.

This is a preview of subscription content, access via your institution.

References

  1. M. Aizenman, F. Germinet, A. Klein, S. Warzel (2009) On Bernoulli decompositions for random variables, concentration bounds, and spectral localization. Probability Theory and Related Fields 143(1–2): 219–238

    MathSciNet  Google Scholar 

  2. N Alon, V. H. Vu (1997) Anti-Hadamard Matrices, Coin Weighing, Threshold Gates, and Indecomposable Hypergraphs. Journal of Combinatorial Theory, Series A 79(1): 133–160

    Google Scholar 

  3. M Anthony, G. Brightwell, J. Shawe-Taylor (1995) On specifying Boolean functions using labelled examples. Discrete Applied Mathematics 61: 1–25

    MathSciNet  Article  Google Scholar 

  4. W Beckner (1975) Inequalities in Fourier analysis. Annals of Mathematics 102: 159–182

    Google Scholar 

  5. R. Beigel (1994). Perceptrons, PP, and the Polynomial Hierarchy. Computational Complexity 4, 339–349.

    Google Scholar 

  6. A Beimel, E. Weinreb (2006) Monotone circuits for monotone weighted threshold functions. Information Processing Letters 97(1): 12–18

    MathSciNet  Article  Google Scholar 

  7. M. Bellare & J. Rompel (1994). Randomness-Efficient Oblivious Sampling. In 35th Annual Symposium on Foundations of Computer Science (FOCS), 276–287.

  8. A Bonami (1970) Etude des coefficients Fourier des fonctiones de L p(G). Ann. Inst. Fourier (Grenoble) 20(2): 335–402

    MathSciNet  Article  Google Scholar 

  9. J Bourgain (2002) On the distributions of the Fourier spectrum of Boolean functions. Israel J. Math. 131: 269–276

    MathSciNet  MATH  Article  Google Scholar 

  10. J Bruck, R. Smolensky (1992) Polynomial threshold functions, AC 0 functions and spectral norms. SIAM Journal on Computing 21(1): 33–42

    MathSciNet  Article  Google Scholar 

  11. N Bshouty, C. Tamon (1996) On the Fourier spectrum of monotone functions. Journal of the ACM 43(4): 747–770

    MathSciNet  Article  Google Scholar 

  12. S Chawla, R. Krauthgamer, R. Kumar, Y. Rabani, D. Sivakumar (2006) On the Hardness of Approximating Multicut and Sparsest-Cut. Computational Complexity 15(2): 94–114

    MathSciNet  Article  Google Scholar 

  13. M. Dertouzos (1965). Threshold Logic: A Synthesis Approach. MIT Press, Cambridge, MA.

  14. I Diakonikolas, P. Gopalan, R. Jaiswal, R. Servedio, E. Viola (2010) Bounded independence fools halfspaces. SIAM J. on Comput. 39(8): 3441–3462

    Article  Google Scholar 

  15. I Dinur, S. Safra (2005) On the Hardness of Approximating Minimum Vertex-Cover. Annals of Mathematics 162(1): 439–485

    MathSciNet  Article  Google Scholar 

  16. W Doeblin, P. Lévy (1936) Calcul des probabilités. Sur les sommes de variables aléatoires indépendantes ’a dispersions bornées inférieurement. C.R. Acad. Sci. 202: 2027–2029

    Google Scholar 

  17. D Dubhashi, A. Panconesi (2009) Concentration of measure for the analysis of randomized algorithms. Cambridge University Press, Cambridge

    Google Scholar 

  18. P Erdös (1945) On a lemma of Littlewood and Offord. Bull. Amer. Math. Soc. 51: 898–902

    MathSciNet  Article  Google Scholar 

  19. P Erdös (1965) Extremal Problems in Number Theory. Proc. Sympos. Pure Math. 8: 181–189

    Article  Google Scholar 

  20. C.G. Esséen (1968) On the concentration function of a sum of independent random variables. Z. Wahrscheinlichkeitstheorie verw. Geb. 9: 290–308

    Article  Google Scholar 

  21. V Feldman, P. Gopalan, S. Khot, A. Ponnuswami (2009) On Agnostic Learning of Parities, Monomials, and Halfspaces. SIAM J. Comput. 39(2): 606–645

    MathSciNet  Article  Google Scholar 

  22. W. Feller (1968). An introduction to probability theory and its applications. John Wiley & Sons.

  23. A. Fiat & D. Pechyony (2004). Decision Trees: More Theoretical Justification for Practical Algorithms. In Algorithmic Learning Theory, 15th International Conference (ALT 2004), 156–170.

  24. E Friedgut (1998) Boolean functions with low average sensitivity depend on few coordinates. Combinatorica 18(1): 474–483

    MathSciNet  Google Scholar 

  25. E Friedgut, G. Kalai (1996) Every Monotone Graph Property has a Sharp Threshold. Proc. Amer. Math. Soc. 124: 2993–3002

    MathSciNet  Article  Google Scholar 

  26. D. Glasner, R. A. Servedio (2009). Distribution-Free Testing Lower Bound for Basic Boolean Functions. Theory of Computing 5(1), 191–216.

  27. P Goldberg (2006) A Bound on the Precision Required to Estimate a Boolean Perceptron from its Average Satisfying Assignment. SIAM Journal on Discrete Mathematics 20: 328–343

    MathSciNet  Article  Google Scholar 

  28. L Gross (1975) Logarithmic Sobolev inequalities. Amer. J. Math. 97(4): 1061–1083

    MathSciNet  Article  Google Scholar 

  29. G Halász (1977) Estimates for the concentration function of combinatorial number theory and probability. Period. Math. Hungar. 8(3): 197–211

    MathSciNet  MATH  Google Scholar 

  30. S Hampson, D. Volper (1986) Linear function neurons: structure and training. Biological Cybernetics 53: 203–217

    MATH  Article  Google Scholar 

  31. J Håstad (1994) On the size of weights for threshold gates. SIAM Journal on Discrete Mathematics 7(3): 484–492

    MathSciNet  Article  Google Scholar 

  32. J. Håstad (2005). Personal communication.

  33. J. Hong (1987). On connectionist models. Technical Report 87-012, Dept. of Computer Science, University of Chicago.

  34. J. Kahn, G. Kalai& N. Linial (1988). The influence of variables on boolean functions. In Proc. 29th Annual Symposium on Foundations of Computer Science (FOCS), 68–80.

  35. A. Kalai (2007). Learning Nested Halfspaces and Uphill Decision Trees. In Proc. 20th Annual Conference on Learning Theory (COLT), 378–392.

  36. A Kalai, A. Klivans, Y. Mansour, R. Servedio (2008) Agnostically Learning Halfspaces. SIAM Journal on Computing 37(6): 1777–1805

    MathSciNet  MATH  Article  Google Scholar 

  37. S Khot, O. Regev (2008) Vertex cover might be hard to approximate to within \({2 - \epsilon}\). Journal of Computer & System Sciences 74(3): 335–349

    Article  Google Scholar 

  38. S Khot, R. Saket (2011) On the hardness of learning intersections of two halfspaces. J. Comput. Syst. Sci. 77(1): 129–141

    Article  Google Scholar 

  39. A.N. Kolmogorov (1958-60). Sur les propriétés des fonctions de concentration de M. P. Lévy. Ann. Inst. H. Poincaré 16, 27–34.

  40. R Krauthgamer, Y. Rabani (2009) Improved Lower Bounds for Embeddings into L1. SIAM J. Comput. 38(6): 2487–2498

    MathSciNet  Article  Google Scholar 

  41. J. E. Littlewood & A. C. Offord (1943). On the number of real roots of a random algebraic equation. III. Rec. Math. [Mat. Sbornik] N.S. 12, 277–286.

    Google Scholar 

  42. K Matulef, R. O’Donnell, R. Rubinfeld, R. Servedio (2010) Testing Halfspaces. SIAM J. on Comput. 39(5): 2004–2047

    Article  Google Scholar 

  43. S Muroga (1971) Threshold logic and its applications. Wiley-Interscience, New York.

    Google Scholar 

  44. S Muroga, I. Toda, S. Takasu (1961) Theory of majority switching elements. J. Franklin Institute 271: 376–418

    Article  Google Scholar 

  45. N Nisan, M. Szegedy (1994) On the degree of Boolean functions as real polynomials. Comput. Complexity 4: 301–313

    MathSciNet  Article  Google Scholar 

  46. R O’Donnell, R. Servedio (2007) Learning monotone decision trees in polynomial time. SIAM J. on Comput. 37(3): 827–844

    Article  Google Scholar 

  47. R O’Donnell, R. Servedio (2011) The Chow Parameters Problem. SIAM J. on Comput. 40(1): 165–199

    Article  Google Scholar 

  48. Y Rabani, A. Shpilka (2010) Explicit construction of a small epsilon-net for linear threshold functions. SIAM J. on Comput. 39(8): 3501–3520

    Article  Google Scholar 

  49. P. Raghavan (1988). Learning in threshold networks. In First Workshop on Computational Learning Theory, 19–27.

  50. B.A. Rogozin (1973). An integral-type estimate for concentration functions of sums of independent random variables. Dokl. Akad. Nauk SSSR 211, 1067–1070.

    Google Scholar 

  51. M Rudelson, R. Vershynin (2008) The Littlewood-Offord Problem and invertibility of random matrices. Advances in Mathematics 218(2): 600–633

    MathSciNet  Article  Google Scholar 

  52. A. Sárközy & E. Szemerédi (1965). Über ein Problem von Erdös und Moser. Acta Arithmetica 11, 205–208.

    Google Scholar 

  53. Servedio R. (2007) Every linear threshold function has a low-weight approximator. Comput. Complexity 16(2): 180–209

    MathSciNet  MATH  Article  Google Scholar 

  54. A.A. Sherstov (2008) Halfspace Matrices. Comput. Complexity 17(2): 149–178

    MathSciNet  Article  Google Scholar 

  55. I.S. Shiganov (1986). Refinement of the upper bound of the constant in the central limit theorem. Journal of Soviet Mathematics 2545–2550.

  56. K.-Y. Siu, V.P. Roychowdhury & T. Kailath (1995). Discrete Neural Computation: A Theoretical Foundation. Prentice-Hall, Englewood Cliffs, NJ.

  57. T. Tao & V. Vu (2006). Additive Combinatorics. Cambridge Studies in Advanced Mathematics, Cambridge University Press, Cambridge.

  58. T Tao, V. Vu (2009) From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices. Bull. Amer. Math. Soc. 46: 377–396

    MathSciNet  Article  Google Scholar 

  59. V. Vu (2008). A Structural Approach to Subset-Sum Problems. In Building Bridges, volume 19 of Bolyai Society Mathematical Studies, 525–545. Springer, Berlin, Heidelberg.

  60. A. Wigderson (1994). The amazing power of pairwise independence. In Proceedings of the 26th ACM Symposium on Theory of Computing, 645–647.

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Rocco A. Servedio.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Diakonikolas, I., Servedio, R.A. Improved Approximation of Linear Threshold Functions. comput. complex. 22, 623–677 (2013). https://doi.org/10.1007/s00037-012-0045-5

Download citation

Keywords

  • Threshold functions
  • total influence
  • average sensitivity
  • integer-weight approximation
  • juntas

Subject classification

  • 68R99