Advertisement

Competing against the Best Nearest Neighbor Filter in Regression

  • Arnak S. Dalalyan
  • Joseph Salmon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6925)

Abstract

Designing statistical procedures that are provably almost as accurate as the best one in a given family is one of central topics in statistics and learning theory. Oracle inequalities offer then a convenient theoretical framework for evaluating different strategies, which can be roughly classified into two classes: selection and aggregation strategies. The ultimate goal is to design strategies satisfying oracle inequalities with leading constant one and rate-optimal residual term. In many recent papers, this problem is addressed in the case where the aim is to beat the best procedure from a given family of linear smoothers. However, the theory developed so far either does not cover the important case of nearest-neighbor smoothers or provides a suboptimal oracle inequality with a leading constant considerably larger than one. In this paper, we prove a new oracle inequality with leading constant one that is valid under a general assumption on linear smoothers allowing, for instance, to compete against the best nearest-neighbor filters.

Keywords

adaptive smoothing nonparametric regression supervised learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arlot, S., Bach, F.: Data-driven calibration of linear estimators with minimal penalties. In: NIPS, pp. 46–54 (2009)Google Scholar
  2. 2.
    Audibert, J.-Y.: Fast learning rates in statistical inference through aggregation. Ann. Statist. 37(4), 1591–1646 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Audibert, J.-Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers. Ann. Statist. 35(2), 608–633 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Baraud, Y., Giraud, C., Huet, S.: Estimator selection in the gaussian setting (2010) (submitted)Google Scholar
  5. 5.
    Ben-David, S., Pal, D., Shalev-Shwartz, S.: Agnostic online learning. In: COLT (2009)Google Scholar
  6. 6.
    Breiman, L.: Better subset regression using the nonnegative garrote. Technometrics 37, 373–384 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Buckheit, J.B., Donoho, D.L.: Wavelab and reproducible research. In: Wavelets and Statistics. Lect. Notes Statist., vol. 103, pp. 55–81. Springer, New York (1995)CrossRefGoogle Scholar
  8. 8.
    Catoni, O.: Statistical learning theory and stochastic optimization. Lecture Notes in Mathematics, vol. 1851. Springer, Berlin (2004)zbMATHGoogle Scholar
  9. 9.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press, Cambridge (2006)CrossRefzbMATHGoogle Scholar
  10. 10.
    Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. Mach. Learn. 66, 321–352 (2007)CrossRefzbMATHGoogle Scholar
  11. 11.
    Cornillon, P.-A., Hengartner, N., Matzner-Løber, E.: Recursive bias estimation for multivariate regression smoothers (2009) (submitted)Google Scholar
  12. 12.
    Dalalyan, A.S., Salmon, J.: Sharp oracle inequalities for aggregation of affine estimators. technical report, arXiv:1104.3969v2 [math.ST] (2011)Google Scholar
  13. 13.
    Dalalyan, A.S., Tsybakov, A.B.: Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Mach. Learn. 72(1-2), 39–61 (2008)CrossRefGoogle Scholar
  14. 14.
    Dalalyan, A.S., Tsybakov, A.B.: Sparse regression learning by aggregation and Langevin Monte-Carlo. In: COLT (2009)Google Scholar
  15. 15.
    Devroye, L., Györfi, L., Lugosi, G.: A probabilistic theory of pattern recognition. Applications of Mathematics, vol. 31. Springer, New York (1996)zbMATHGoogle Scholar
  16. 16.
    George, E.I.: Minimax multiple shrinkage estimation. Ann. Statist. 14(1), 188–205 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Gerchinovitz, S.: Sparsity regret bounds for individual sequences in online linear regression (submitted, 2011)Google Scholar
  18. 18.
    Goldenshluger, A., Lepski, O.V.: Universal pointwise selection rule in multivariate function estimation. Bernoulli 14(4), 1150–1190 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Kalai, A., Klivans, A., Mansour, Y., Servedio, R.: Agnostically learning halfspaces. SIAM J. Comput. 37(6), 1777–1805 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Kearns, M.J., Schapire, R.E., Sellie, L.: Toward efficient agnostic learning. Machine Learning 17(2-3), 115–141 (1994)CrossRefzbMATHGoogle Scholar
  21. 21.
    Kivinen, J., Warmuth, M.K.: Averaging expert predictions. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 153–167. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  22. 22.
    Lafferty, J., Wasserman, L.: Rodeo: sparse, greedy nonparametric regression. Ann. Statist. 36(1), 28–63 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Lepski, O.V., Mammen, E., Spokoiny, V.G.: Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25(3), 929–947 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Leung, G., Barron, A.R.: Information theory and mixing least-squares regressions. IEEE Trans. Inf. Theory 52(8), 3396–3410 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Li, K.-C.: From Stein’s unbiased risk estimates to the method of generalized cross validation. Ann. Statist. 13(4), 1352–1377 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Salmon, J., Dalalyan, A.S.: Optimal aggregation of affine estimators. In: COLT (2011)Google Scholar
  27. 27.
    Stein, C.M.: Estimation of the mean of a multivariate distribution. In: Proc. Prague Symp. Asymptotic Statist (1973)Google Scholar
  28. 28.
    Tsybakov, A.B.: Optimal rates of aggregation. In: COLT, pp. 303–313 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Arnak S. Dalalyan
    • 1
    • 2
  • Joseph Salmon
    • 1
    • 2
  1. 1.Université Paris Est, Ecole des Ponts ParisTechMarne-la-Vallée Cedex 2France
  2. 2.Electrical and Computer EngineeringDuke UniversityDurham

Personalised recommendations