Advertisement

Machine Learning

, Volume 80, Issue 2–3, pp 273–294 | Cite as

Polynomial regression under arbitrary product distributions

  • Eric Blais
  • Ryan O’Donnell
  • Karl Wimmer
Article

Abstract

In recent work, Kalai, Klivans, Mansour, and Servedio (2005) studied a variant of the “Low-Degree (Fourier) Algorithm” for learning under the uniform probability distribution on {0,1} n . They showed that the L 1 polynomial regression algorithm yields agnostic (tolerant to arbitrary noise) learning algorithms with respect to the class of threshold functions—under certain restricted instance distributions, including uniform on {0,1} n and Gaussian on ℝ n . In this work we show how all learning results based on the Low-Degree Algorithm can be generalized to give almost identical agnostic guarantees under arbitrary product distributions on instance spaces X 1×⋅⋅⋅×X n . We also extend these results to learning under mixtures of product distributions.

The main technical innovation is the use of (Hoeffding) orthogonal decomposition and the extension of the “noise sensitivity method” to arbitrary product spaces. In particular, we give a very simple proof that threshold functions over arbitrary product spaces have δ-noise sensitivity \(O(\sqrt{\delta})\), resolving an open problem suggested by Peres (2004).

Keywords

Agnostic learning Polynomial regression Linear threshold functions Noise sensitivity 

References

  1. Benjamini, I., Kalai, G., & Schramm, O. (1999). Noise sensitivity of Boolean functions and applications to percolation. Publications Mathématiques, 90(1), 5–43. zbMATHMathSciNetGoogle Scholar
  2. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. zbMATHGoogle Scholar
  3. Feldman, J., O’Donnell, R., & Servedio, R. (2005). Learning mixtures of product distributions over discrete domains. In Proc. 46th IEEE symp. on foundations of comp. sci. (pp. 501–510). Google Scholar
  4. Feldman, J., O’Donnell, R., & Servedio, R. (2006). Pac learning mixtures of Gaussians with no separation assumption. In Proc. 19th workshop on comp. learning theory (pp. 20–34). Google Scholar
  5. Furst, M., Jackson, J., & Smith, S. (1991). Improved learning of AC0 functions. In Proc. 4th workshop on comp. learning theory (pp. 317–325). Google Scholar
  6. Guruswami, V., & Raghavendra, P. (2006). Hardness of learning halfspaces with noise. In Proc. 47th IEEE symp. on foundations of comp. sci. (pp. 543–552). Google Scholar
  7. Håstad, J. (2001). A slight sharpening of LMN. Journal of Computing and System Sciences, 63(3), 498–508. zbMATHCrossRefGoogle Scholar
  8. Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics, 19(3), 293–325. zbMATHCrossRefMathSciNetGoogle Scholar
  9. Kalai, A. (2006). Machine learning theory course notes. http://www.cc.gatech.edu/~atk/teaching/mlt06/lectures/mlt-06-10.pdf.
  10. Kalai, A., Klivans, A., Mansour, Y., & Servedio, R. (2005). Agnostically learning halfspaces. In Proc. 46th IEEE symp. on foundations of comp. sci. (pp. 11–20). Google Scholar
  11. Karlin, S., & Rinott, Y. (1982). Applications of Anova type decompositions for comparisons of conditional variance statistics including jackknife estimates. Annals of Statistics, 10(2), 485–501. zbMATHCrossRefMathSciNetGoogle Scholar
  12. Kearns, M., Schapire, R., & Sellie, L. (1994). Toward efficient agnostic learning. Machine Learning, 17(2), 115–141. zbMATHGoogle Scholar
  13. Klivans, A., O’Donnell, R., & Servedio, R. (2004). Learning intersections and thresholds of halfspaces. Journal of Computing and System Sciences, 68(4), 808–840. zbMATHCrossRefMathSciNetGoogle Scholar
  14. Lee, W. S., Bartlett, P., & Williamson, R. (1995). On efficient agnostic learning of linear combinations of basis functions. In Proc. 8th workshop on comp. learning theory (pp. 369–376). Google Scholar
  15. Linial, N., Mansour, Y., & Nisan, N. (1993). Constant depth circuits, Fourier transform, and learnability. Journal of the ACM, 40(3), 607–620. zbMATHCrossRefMathSciNetGoogle Scholar
  16. Minsky, M., & Papert, S. (1969). Perceptrons. Cambridge: MIT Press. zbMATHGoogle Scholar
  17. Mossel, E., O’Donnell, R., & Oleszkiewicz, K. (2005). Noise stability of functions with low influences: invariance and optimality. In Proc. 46th IEEE symp. on foundations of comp. sci. (pp. 21–30). Google Scholar
  18. O’Donnell, R. (2003). Computational aspects of noise sensitivity. PhD thesis, MIT. Google Scholar
  19. O’Donnell, R., & Servedio, R. (2003). New degree bounds for polynomial threshold functions. In Proc. 35th ACM symp. on the theory of computing (pp. 325–334). Google Scholar
  20. Peres, Y. (2004). Noise stability of weighted majority. arXiv:math/0412377v1.
  21. Steele, J. M. (1986). An Efron-Stein inequality for nonsymmetric statistics. Annals of Statistics, 14(2), 753–758. zbMATHCrossRefMathSciNetGoogle Scholar
  22. Valiant, L. (1984). A theory of the learnable. Communications of the of the ACM, 27(11), 1134–1142. zbMATHCrossRefGoogle Scholar
  23. von Mises, R. (1947). On the asymptotic distribution of differentiable statistical functions. Annals of Mathematical Statistics, 18(3), 309–348. zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.Duquesne UniversityPittsburghUSA

Personalised recommendations