Mathematical Methods of Statistics

, Volume 16, Issue 3, pp 260–280 | Cite as

Linear and convex aggregation of density estimators

  • Ph. Rigollet
  • A. B. Tsybakov


We study the problem of finding the best linear and convex combination of M estimators of a density with respect to the mean squared risk. We suggest aggregation procedures and we prove sharp oracle inequalities for their risks, i.e., oracle inequalities with leading constant 1. We also obtain lower bounds showing that these procedures attain optimal rates of aggregation. As an example, we consider aggregation of multivariate kernel density estimators with different bandwidths. We show that linear and convex aggregates mimic the kernel oracles in asymptotically exact sense. We prove that, for Pinsker’s kernel, the proposed aggregates are sharp asymptotically minimax simultaneously over a large scale of Sobolev classes of densities. Finally, we provide simulations demonstrating performance of the convex aggregation procedure.

Key words

aggregation oracle inequalities statistical learning nonparametric density estimation sharp minimax adaptivity kernel estimates of a density 

2000 Mathematics Subject Classification

primary 62G07 secondary 62G05 68T05 62G20 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A. Barron, “Are Bayes rules consistent in information?” in Open Problems in Communication and Computation, Ed. by T. M. Cover and B. Gopinath (Springer, New York, 1987), pp. 85–91.Google Scholar
  2. 2.
    L. Birgé, “On estimating a density using Hellinger distance and some other strange facts”, Probab. Theory Rel. Fields 71, 271–291 (1986).zbMATHCrossRefGoogle Scholar
  3. 3.
    L. Birgé, “Model selection via testing: an alternative to (penalized) maximum likelihood estimators”, Ann. Inst. H. Poincaré (B) Probab. et Statist. 42, 273–325 (2006).zbMATHCrossRefGoogle Scholar
  4. 4.
    L. Birgé, “The Brouwer lecture 2005: Statistical estimation with model selection”, available at arXiv:math.ST/0605187 (2006).Google Scholar
  5. 5.
    F. Bunea, A. B. Tsybakov, and M. H. Wegkamp, “Aggregation for Gaussian regression”, Ann. Statist. 35, 1674–1697 (2007).CrossRefzbMATHMathSciNetGoogle Scholar
  6. 6.
    O. Catoni, “Universal” Aggregation Rules with Exact Bias Bounds, Preprint n. 510 (Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7, Paris, 1999), available at Scholar
  7. 7.
    O. Catoni, (2004). Statistical Learning Theory and Stochastic Optimization in Ecole d’Eté de Probabilités de Saint-Flour XXXI-2001. Lecture Notes in Mathematics (Springer, New York, 2004), Vol. 1851.Google Scholar
  8. 8.
    C. Dalelane, “Exact oracle inequality for a sharp adaptive kernel density estimator”, 2005, available at
  9. 9.
    L. Devroye and G. Lugosi, Combinatorial Methods in Density Estimation (Springer, New York, 2001).zbMATHGoogle Scholar
  10. 10.
    G. K. Golubev, “LAN in nonparametric estimation of functions and lower bounds for quadratic risks”, Theory Probab. Appl. 36, 152–157 (1991).zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    G. K. Golubev, “Nonparametric estimation of smooth probability densities in L 2”, Problems of Inform. Transmission 28, 44–54 (1992).MathSciNetGoogle Scholar
  12. 12.
    A. Juditsky and A. Nemirovski, “Functional aggregation for nonparametric regression”, Ann. Statist. 28, 681–712 (2000).zbMATHCrossRefMathSciNetGoogle Scholar
  13. 13.
    J. Q. Li and A. Barron, “Mixture density estimation”, in Advances in Neural Information Processings Systems, Ed. by S. A. Solla, T. K. Leen, and K.-R. Muller, (Morgan Kaufmann Publ., San Mateo, CA, 1999), Vol. 12.Google Scholar
  14. 14.
    M. C. Marron and M. P. Wand, “Exact mean integrated square error”, Ann. Statist. 20, 712–713 (1992).zbMATHMathSciNetGoogle Scholar
  15. 15.
    A. Nemirovski, Topics in Non-parametric Statistics, in Ecole d’Eté de Probabilités de Saint-Flour XXVIII-1998. Lecture Notes in Mathematics (Springer, New York, 2000), Vol. 1738.Google Scholar
  16. 16.
    M. S. Pinsker, “Optimal filtering of square integrable signals in Gaussian white noise”, Problems Inform. Transmission 16, 120–133 (1980).zbMATHGoogle Scholar
  17. 17.
    P. Rigollet, “Adaptive density estimation using the blockwise Stein method”, Bernoulli 12, 351–370 (2006).zbMATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    P. Rigollet, “Inégalités d’oracle, agrégation et adaptation”, PhD thesis (2006), available at
  19. 19.
    A. Samarov and A. B. Tsybakov, “Aggregation of density estimators and dimension reduction”, in Advances in Statistical Modeling and Inference. Essays in Honor of Kjell A. Doksum, Ed. by V. Nair (World Scientific, Singapore e.a., 2007), pp. 233–251.Google Scholar
  20. 20.
    M. Schipper, “Optimal rates and constants in L 2-minimax estimation of probability density functions”, Math. Methods Statist. 5, 253–274 (1996).zbMATHMathSciNetGoogle Scholar
  21. 21.
    D. W. Scott, Multivariate Density Estimation (Wiley, New York, 1992).zbMATHGoogle Scholar
  22. 22.
    S. J. Sheather and M. C. Jones, “A reliable data-based bandwidth selection method for kernel density estimation”, J. Roy. Statist. Soc. Ser. B 53, 683–690 (1991).zbMATHMathSciNetGoogle Scholar
  23. 23.
    B. W. Silverman, Density Estimation for Statistics and Data Analysis (Chapman and Hall, London, 1986).zbMATHGoogle Scholar
  24. 24.
    C. J. Stone, “An asymptotically optimal window selection rule for kernel density estimates”, Ann. Statist. 12, 1285–1297 (1984).zbMATHMathSciNetGoogle Scholar
  25. 25.
    A. Tsybakov, (2003). “Optimal rates of aggregation”, in Computational Learning Theory and Kernel Machines. Proc. 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines, Ed. by B. Schölkopf and M. Warmuth, Lecture Notes in Artificial Intelligence (Springer, Heidelberg, 2003), Vol. 2777, pp. 303–313.Google Scholar
  26. 26.
    A. Tsybakov, Introduction à l’estimation non-paramétrique (Springer, Berlin, 2004).zbMATHGoogle Scholar
  27. 27.
    M. P. Wand and M. C. Jones, Kernel Smoothing (Chapman and Hall, London, 1995).zbMATHGoogle Scholar
  28. 28.
    M. H. Wegkamp, “Quasi-universal bandwidth selection for kernel density estimators”, Canad. J. Statist. 27, 409–420 (1999).zbMATHCrossRefMathSciNetGoogle Scholar
  29. 29.
    Y. Yang, “Mixing strategies for density estimation”, Ann. Statist. 28, 75–87 (2000).zbMATHCrossRefMathSciNetGoogle Scholar
  30. 30.
    T. Zhang, “From ∈-entropy to KL-entropy: analysis of minimum information complexity density estimation”, Ann. Statist. 34, 2180–2210 (2006).zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Allerton Press, Inc. 2007

Authors and Affiliations

  1. 1.Georgia Institute of TechnologyUSA
  2. 2.Université Paris — VIFrance

Personalised recommendations