Machine Learning

, Volume 80, Issue 2–3, pp 273–294 | Cite as

Polynomial regression under arbitrary product distributions



In recent work, Kalai, Klivans, Mansour, and Servedio (2005) studied a variant of the “Low-Degree (Fourier) Algorithm” for learning under the uniform probability distribution on {0,1} n . They showed that the L 1 polynomial regression algorithm yields agnostic (tolerant to arbitrary noise) learning algorithms with respect to the class of threshold functions—under certain restricted instance distributions, including uniform on {0,1} n and Gaussian on ℝ n . In this work we show how all learning results based on the Low-Degree Algorithm can be generalized to give almost identical agnostic guarantees under arbitrary product distributions on instance spaces X 1×⋅⋅⋅×X n . We also extend these results to learning under mixtures of product distributions.

The main technical innovation is the use of (Hoeffding) orthogonal decomposition and the extension of the “noise sensitivity method” to arbitrary product spaces. In particular, we give a very simple proof that threshold functions over arbitrary product spaces have δ-noise sensitivity \(O(\sqrt{\delta})\), resolving an open problem suggested by Peres (2004).


Agnostic learning Polynomial regression Linear threshold functions Noise sensitivity 


  1. Benjamini, I., Kalai, G., & Schramm, O. (1999). Noise sensitivity of Boolean functions and applications to percolation. Publications Mathématiques, 90(1), 5–43. MATHMathSciNetGoogle Scholar
  2. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. MATHGoogle Scholar
  3. Feldman, J., O’Donnell, R., & Servedio, R. (2005). Learning mixtures of product distributions over discrete domains. In Proc. 46th IEEE symp. on foundations of comp. sci. (pp. 501–510). Google Scholar
  4. Feldman, J., O’Donnell, R., & Servedio, R. (2006). Pac learning mixtures of Gaussians with no separation assumption. In Proc. 19th workshop on comp. learning theory (pp. 20–34). Google Scholar
  5. Furst, M., Jackson, J., & Smith, S. (1991). Improved learning of AC0 functions. In Proc. 4th workshop on comp. learning theory (pp. 317–325). Google Scholar
  6. Guruswami, V., & Raghavendra, P. (2006). Hardness of learning halfspaces with noise. In Proc. 47th IEEE symp. on foundations of comp. sci. (pp. 543–552). Google Scholar
  7. Håstad, J. (2001). A slight sharpening of LMN. Journal of Computing and System Sciences, 63(3), 498–508. MATHCrossRefGoogle Scholar
  8. Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics, 19(3), 293–325. MATHCrossRefMathSciNetGoogle Scholar
  9. Kalai, A. (2006). Machine learning theory course notes.
  10. Kalai, A., Klivans, A., Mansour, Y., & Servedio, R. (2005). Agnostically learning halfspaces. In Proc. 46th IEEE symp. on foundations of comp. sci. (pp. 11–20). Google Scholar
  11. Karlin, S., & Rinott, Y. (1982). Applications of Anova type decompositions for comparisons of conditional variance statistics including jackknife estimates. Annals of Statistics, 10(2), 485–501. MATHCrossRefMathSciNetGoogle Scholar
  12. Kearns, M., Schapire, R., & Sellie, L. (1994). Toward efficient agnostic learning. Machine Learning, 17(2), 115–141. MATHGoogle Scholar
  13. Klivans, A., O’Donnell, R., & Servedio, R. (2004). Learning intersections and thresholds of halfspaces. Journal of Computing and System Sciences, 68(4), 808–840. MATHCrossRefMathSciNetGoogle Scholar
  14. Lee, W. S., Bartlett, P., & Williamson, R. (1995). On efficient agnostic learning of linear combinations of basis functions. In Proc. 8th workshop on comp. learning theory (pp. 369–376). Google Scholar
  15. Linial, N., Mansour, Y., & Nisan, N. (1993). Constant depth circuits, Fourier transform, and learnability. Journal of the ACM, 40(3), 607–620. MATHCrossRefMathSciNetGoogle Scholar
  16. Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge: MIT Press. MATHGoogle Scholar
  17. Mossel, E., O’Donnell, R., & Oleszkiewicz, K. (2005). Noise stability of functions with low influences: invariance and optimality. In Proc. 46th IEEE symp. on foundations of comp. sci. (pp. 21–30). Google Scholar
  18. O’Donnell, R. (2003). Computational aspects of noise sensitivity. PhD thesis, MIT. Google Scholar
  19. O’Donnell, R., & Servedio, R. (2003). New degree bounds for polynomial threshold functions. In Proc. 35th ACM symp. on the theory of computing (pp. 325–334). Google Scholar
  20. Peres, Y. (2004). Noise stability of weighted majority. arXiv:math/0412377v1.
  21. Steele, J. M. (1986). An Efron-Stein inequality for nonsymmetric statistics. Annals of Statistics, 14(2), 753–758. MATHCrossRefMathSciNetGoogle Scholar
  22. Valiant, L. (1984). A theory of the learnable. Communications of the of the ACM, 27(11), 1134–1142. MATHCrossRefGoogle Scholar
  23. von Mises, R. (1947). On the asymptotic distribution of differentiable statistical functions. Annals of Mathematical Statistics, 18(3), 309–348. MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.Duquesne UniversityPittsburghUSA

Personalised recommendations