Abstract
Designing statistical procedures that are provably almost as accurate as the best one in a given family is one of central topics in statistics and learning theory. Oracle inequalities offer then a convenient theoretical framework for evaluating different strategies, which can be roughly classified into two classes: selection and aggregation strategies. The ultimate goal is to design strategies satisfying oracle inequalities with leading constant one and rate-optimal residual term. In many recent papers, this problem is addressed in the case where the aim is to beat the best procedure from a given family of linear smoothers. However, the theory developed so far either does not cover the important case of nearest-neighbor smoothers or provides a suboptimal oracle inequality with a leading constant considerably larger than one. In this paper, we prove a new oracle inequality with leading constant one that is valid under a general assumption on linear smoothers allowing, for instance, to compete against the best nearest-neighbor filters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arlot, S., Bach, F.: Data-driven calibration of linear estimators with minimal penalties. In: NIPS, pp. 46–54 (2009)
Audibert, J.-Y.: Fast learning rates in statistical inference through aggregation. Ann. Statist. 37(4), 1591–1646 (2009)
Audibert, J.-Y., Tsybakov, A.B.: Fast learning rates for plug-in classifiers. Ann. Statist. 35(2), 608–633 (2007)
Baraud, Y., Giraud, C., Huet, S.: Estimator selection in the gaussian setting (2010) (submitted)
Ben-David, S., Pal, D., Shalev-Shwartz, S.: Agnostic online learning. In: COLT (2009)
Breiman, L.: Better subset regression using the nonnegative garrote. Technometrics 37, 373–384 (1995)
Buckheit, J.B., Donoho, D.L.: Wavelab and reproducible research. In: Wavelets and Statistics. Lect. Notes Statist., vol. 103, pp. 55–81. Springer, New York (1995)
Catoni, O.: Statistical learning theory and stochastic optimization. Lecture Notes in Mathematics, vol. 1851. Springer, Berlin (2004)
Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press, Cambridge (2006)
Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. Mach. Learn. 66, 321–352 (2007)
Cornillon, P.-A., Hengartner, N., Matzner-Løber, E.: Recursive bias estimation for multivariate regression smoothers (2009) (submitted)
Dalalyan, A.S., Salmon, J.: Sharp oracle inequalities for aggregation of affine estimators. technical report, arXiv:1104.3969v2 [math.ST] (2011)
Dalalyan, A.S., Tsybakov, A.B.: Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Mach. Learn. 72(1-2), 39–61 (2008)
Dalalyan, A.S., Tsybakov, A.B.: Sparse regression learning by aggregation and Langevin Monte-Carlo. In: COLT (2009)
Devroye, L., Györfi, L., Lugosi, G.: A probabilistic theory of pattern recognition. Applications of Mathematics, vol. 31. Springer, New York (1996)
George, E.I.: Minimax multiple shrinkage estimation. Ann. Statist. 14(1), 188–205 (1986)
Gerchinovitz, S.: Sparsity regret bounds for individual sequences in online linear regression (submitted, 2011)
Goldenshluger, A., Lepski, O.V.: Universal pointwise selection rule in multivariate function estimation. Bernoulli 14(4), 1150–1190 (2008)
Kalai, A., Klivans, A., Mansour, Y., Servedio, R.: Agnostically learning halfspaces. SIAM J. Comput. 37(6), 1777–1805 (2008)
Kearns, M.J., Schapire, R.E., Sellie, L.: Toward efficient agnostic learning. Machine Learning 17(2-3), 115–141 (1994)
Kivinen, J., Warmuth, M.K.: Averaging expert predictions. In: Fischer, P., Simon, H.U. (eds.) EuroCOLT 1999. LNCS (LNAI), vol. 1572, pp. 153–167. Springer, Heidelberg (1999)
Lafferty, J., Wasserman, L.: Rodeo: sparse, greedy nonparametric regression. Ann. Statist. 36(1), 28–63 (2008)
Lepski, O.V., Mammen, E., Spokoiny, V.G.: Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25(3), 929–947 (1997)
Leung, G., Barron, A.R.: Information theory and mixing least-squares regressions. IEEE Trans. Inf. Theory 52(8), 3396–3410 (2006)
Li, K.-C.: From Stein’s unbiased risk estimates to the method of generalized cross validation. Ann. Statist. 13(4), 1352–1377 (1985)
Salmon, J., Dalalyan, A.S.: Optimal aggregation of affine estimators. In: COLT (2011)
Stein, C.M.: Estimation of the mean of a multivariate distribution. In: Proc. Prague Symp. Asymptotic Statist (1973)
Tsybakov, A.B.: Optimal rates of aggregation. In: COLT, pp. 303–313 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dalalyan, A.S., Salmon, J. (2011). Competing against the Best Nearest Neighbor Filter in Regression. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds) Algorithmic Learning Theory. ALT 2011. Lecture Notes in Computer Science(), vol 6925. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24412-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-24412-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24411-7
Online ISBN: 978-3-642-24412-4
eBook Packages: Computer ScienceComputer Science (R0)