Competing against the Best Nearest Neighbor Filter in Regression

  • Arnak S. Dalalyan
  • Joseph Salmon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6925)


Designing statistical procedures that are provably almost as accurate as the best one in a given family is one of central topics in statistics and learning theory. Oracle inequalities offer then a convenient theoretical framework for evaluating different strategies, which can be roughly classified into two classes: selection and aggregation strategies. The ultimate goal is to design strategies satisfying oracle inequalities with leading constant one and rate-optimal residual term. In many recent papers, this problem is addressed in the case where the aim is to beat the best procedure from a given family of linear smoothers. However, the theory developed so far either does not cover the important case of nearest-neighbor smoothers or provides a suboptimal oracle inequality with a leading constant considerably larger than one. In this paper, we prove a new oracle inequality with leading constant one that is valid under a general assumption on linear smoothers allowing, for instance, to compete against the best nearest-neighbor filters.


adaptive smoothing nonparametric regression supervised learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arlot, S., Bach, F.: Data-driven calibration of linear estimators with minimal penalties. In: NIPS, pp. 46–54 (2009)Google Scholar
  2. 2.
    Audibert, J.-Y.: Fast learning rates in statistical inference through aggregation. Ann. Statist. 37(4), 1591–1646 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Audibert, J.-Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers. Ann. Statist. 35(2), 608–633 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Baraud, Y., Giraud, C., Huet, S.: Estimator selection in the gaussian setting (2010) (submitted)Google Scholar
  5. 5.
    Ben-David, S., Pal, D., Shalev-Shwartz, S.: Agnostic online learning. In: COLT (2009)Google Scholar
  6. 6.
    Breiman, L.: Better subset regression using the nonnegative garrote. Technometrics 37, 373–384 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Buckheit, J.B., Donoho, D.L.: Wavelab and reproducible research. In: Wavelets and Statistics. Lect. Notes Statist., vol. 103, pp. 55–81. Springer, New York (1995)CrossRefGoogle Scholar
  8. 8.
    Catoni, O.: Statistical learning theory and stochastic optimization. Lecture Notes in Mathematics, vol. 1851. Springer, Berlin (2004)zbMATHGoogle Scholar
  9. 9.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press, Cambridge (2006)CrossRefzbMATHGoogle Scholar
  10. 10.
    Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. Mach. Learn. 66, 321–352 (2007)CrossRefzbMATHGoogle Scholar
  11. 11.
    Cornillon, P.-A., Hengartner, N., Matzner-Løber, E.: Recursive bias estimation for multivariate regression smoothers (2009) (submitted)Google Scholar
  12. 12.
    Dalalyan, A.S., Salmon, J.: Sharp oracle inequalities for aggregation of affine estimators. technical report, arXiv:1104.3969v2 [math.ST] (2011)Google Scholar
  13. 13.
    Dalalyan, A.S., Tsybakov, A.B.: Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Mach. Learn. 72(1-2), 39–61 (2008)CrossRefGoogle Scholar
  14. 14.
    Dalalyan, A.S., Tsybakov, A.B.: Sparse regression learning by aggregation and Langevin Monte-Carlo. In: COLT (2009)Google Scholar
  15. 15.
    Devroye, L., Györfi, L., Lugosi, G.: A probabilistic theory of pattern recognition. Applications of Mathematics, vol. 31. Springer, New York (1996)zbMATHGoogle Scholar
  16. 16.
    George, E.I.: Minimax multiple shrinkage estimation. Ann. Statist. 14(1), 188–205 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Gerchinovitz, S.: Sparsity regret bounds for individual sequences in online linear regression (submitted, 2011)Google Scholar
  18. 18.
    Goldenshluger, A., Lepski, O.V.: Universal pointwise selection rule in multivariate function estimation. Bernoulli 14(4), 1150–1190 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Kalai, A., Klivans, A., Mansour, Y., Servedio, R.: Agnostically learning halfspaces. SIAM J. Comput. 37(6), 1777–1805 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Kearns, M.J., Schapire, R.E., Sellie, L.: Toward efficient agnostic learning. Machine Learning 17(2-3), 115–141 (1994)CrossRefzbMATHGoogle Scholar
  21. 21.
    Kivinen, J., Warmuth, M.K.: Averaging expert predictions. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 153–167. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  22. 22.
    Lafferty, J., Wasserman, L.: Rodeo: sparse, greedy nonparametric regression. Ann. Statist. 36(1), 28–63 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Lepski, O.V., Mammen, E., Spokoiny, V.G.: Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25(3), 929–947 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Leung, G., Barron, A.R.: Information theory and mixing least-squares regressions. IEEE Trans. Inf. Theory 52(8), 3396–3410 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Li, K.-C.: From Stein’s unbiased risk estimates to the method of generalized cross validation. Ann. Statist. 13(4), 1352–1377 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Salmon, J., Dalalyan, A.S.: Optimal aggregation of affine estimators. In: COLT (2011)Google Scholar
  27. 27.
    Stein, C.M.: Estimation of the mean of a multivariate distribution. In: Proc. Prague Symp. Asymptotic Statist (1973)Google Scholar
  28. 28.
    Tsybakov, A.B.: Optimal rates of aggregation. In: COLT, pp. 303–313 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Arnak S. Dalalyan
    • 1
    • 2
  • Joseph Salmon
    • 1
    • 2
  1. 1.Université Paris Est, Ecole des Ponts ParisTechMarne-la-Vallée Cedex 2France
  2. 2.Electrical and Computer EngineeringDuke UniversityDurham

Personalised recommendations