Skip to main content

Linear and convex aggregation of density estimators


We study the problem of finding the best linear and convex combination of M estimators of a density with respect to the mean squared risk. We suggest aggregation procedures and we prove sharp oracle inequalities for their risks, i.e., oracle inequalities with leading constant 1. We also obtain lower bounds showing that these procedures attain optimal rates of aggregation. As an example, we consider aggregation of multivariate kernel density estimators with different bandwidths. We show that linear and convex aggregates mimic the kernel oracles in asymptotically exact sense. We prove that, for Pinsker’s kernel, the proposed aggregates are sharp asymptotically minimax simultaneously over a large scale of Sobolev classes of densities. Finally, we provide simulations demonstrating performance of the convex aggregation procedure.

This is a preview of subscription content, access via your institution.


  1. A. Barron, “Are Bayes rules consistent in information?” in Open Problems in Communication and Computation, Ed. by T. M. Cover and B. Gopinath (Springer, New York, 1987), pp. 85–91.

    Google Scholar 

  2. L. Birgé, “On estimating a density using Hellinger distance and some other strange facts”, Probab. Theory Rel. Fields 71, 271–291 (1986).

    MATH  Article  Google Scholar 

  3. L. Birgé, “Model selection via testing: an alternative to (penalized) maximum likelihood estimators”, Ann. Inst. H. Poincaré (B) Probab. et Statist. 42, 273–325 (2006).

    MATH  Article  Google Scholar 

  4. L. Birgé, “The Brouwer lecture 2005: Statistical estimation with model selection”, available at arXiv:math.ST/0605187 (2006).

  5. F. Bunea, A. B. Tsybakov, and M. H. Wegkamp, “Aggregation for Gaussian regression”, Ann. Statist. 35, 1674–1697 (2007).

    Article  MATH  MathSciNet  Google Scholar 

  6. O. Catoni, “Universal” Aggregation Rules with Exact Bias Bounds, Preprint n. 510 (Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7, Paris, 1999), available at

    Google Scholar 

  7. O. Catoni, (2004). Statistical Learning Theory and Stochastic Optimization in Ecole d’Eté de Probabilités de Saint-Flour XXXI-2001. Lecture Notes in Mathematics (Springer, New York, 2004), Vol. 1851.

    Google Scholar 

  8. C. Dalelane, “Exact oracle inequality for a sharp adaptive kernel density estimator”, 2005, available at

  9. L. Devroye and G. Lugosi, Combinatorial Methods in Density Estimation (Springer, New York, 2001).

    MATH  Google Scholar 

  10. G. K. Golubev, “LAN in nonparametric estimation of functions and lower bounds for quadratic risks”, Theory Probab. Appl. 36, 152–157 (1991).

    MATH  Article  MathSciNet  Google Scholar 

  11. G. K. Golubev, “Nonparametric estimation of smooth probability densities in L 2”, Problems of Inform. Transmission 28, 44–54 (1992).

    MathSciNet  Google Scholar 

  12. A. Juditsky and A. Nemirovski, “Functional aggregation for nonparametric regression”, Ann. Statist. 28, 681–712 (2000).

    MATH  Article  MathSciNet  Google Scholar 

  13. J. Q. Li and A. Barron, “Mixture density estimation”, in Advances in Neural Information Processings Systems, Ed. by S. A. Solla, T. K. Leen, and K.-R. Muller, (Morgan Kaufmann Publ., San Mateo, CA, 1999), Vol. 12.

    Google Scholar 

  14. M. C. Marron and M. P. Wand, “Exact mean integrated square error”, Ann. Statist. 20, 712–713 (1992).

    MATH  MathSciNet  Google Scholar 

  15. A. Nemirovski, Topics in Non-parametric Statistics, in Ecole d’Eté de Probabilités de Saint-Flour XXVIII-1998. Lecture Notes in Mathematics (Springer, New York, 2000), Vol. 1738.

    Google Scholar 

  16. M. S. Pinsker, “Optimal filtering of square integrable signals in Gaussian white noise”, Problems Inform. Transmission 16, 120–133 (1980).

    MATH  Google Scholar 

  17. P. Rigollet, “Adaptive density estimation using the blockwise Stein method”, Bernoulli 12, 351–370 (2006).

    MATH  MathSciNet  Article  Google Scholar 

  18. P. Rigollet, “Inégalités d’oracle, agrégation et adaptation”, PhD thesis (2006), available at

  19. A. Samarov and A. B. Tsybakov, “Aggregation of density estimators and dimension reduction”, in Advances in Statistical Modeling and Inference. Essays in Honor of Kjell A. Doksum, Ed. by V. Nair (World Scientific, Singapore e.a., 2007), pp. 233–251.

    Google Scholar 

  20. M. Schipper, “Optimal rates and constants in L 2-minimax estimation of probability density functions”, Math. Methods Statist. 5, 253–274 (1996).

    MATH  MathSciNet  Google Scholar 

  21. D. W. Scott, Multivariate Density Estimation (Wiley, New York, 1992).

    MATH  Google Scholar 

  22. S. J. Sheather and M. C. Jones, “A reliable data-based bandwidth selection method for kernel density estimation”, J. Roy. Statist. Soc. Ser. B 53, 683–690 (1991).

    MATH  MathSciNet  Google Scholar 

  23. B. W. Silverman, Density Estimation for Statistics and Data Analysis (Chapman and Hall, London, 1986).

    MATH  Google Scholar 

  24. C. J. Stone, “An asymptotically optimal window selection rule for kernel density estimates”, Ann. Statist. 12, 1285–1297 (1984).

    MATH  MathSciNet  Google Scholar 

  25. A. Tsybakov, (2003). “Optimal rates of aggregation”, in Computational Learning Theory and Kernel Machines. Proc. 16th Annual Conference on Learning Theory (COLT) and 7th Annual Workshop on Kernel Machines, Ed. by B. Schölkopf and M. Warmuth, Lecture Notes in Artificial Intelligence (Springer, Heidelberg, 2003), Vol. 2777, pp. 303–313.

    Google Scholar 

  26. A. Tsybakov, Introduction à l’estimation non-paramétrique (Springer, Berlin, 2004).

    MATH  Google Scholar 

  27. M. P. Wand and M. C. Jones, Kernel Smoothing (Chapman and Hall, London, 1995).

    MATH  Google Scholar 

  28. M. H. Wegkamp, “Quasi-universal bandwidth selection for kernel density estimators”, Canad. J. Statist. 27, 409–420 (1999).

    MATH  Article  MathSciNet  Google Scholar 

  29. Y. Yang, “Mixing strategies for density estimation”, Ann. Statist. 28, 75–87 (2000).

    MATH  Article  MathSciNet  Google Scholar 

  30. T. Zhang, “From ∈-entropy to KL-entropy: analysis of minimum information complexity density estimation”, Ann. Statist. 34, 2180–2210 (2006).

    MATH  Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ph. Rigollet.

About this article

Cite this article

Rigollet, P., Tsybakov, A.B. Linear and convex aggregation of density estimators. Math. Meth. Stat. 16, 260–280 (2007).

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI:

Key words

  • aggregation
  • oracle inequalities
  • statistical learning
  • nonparametric density estimation
  • sharp minimax adaptivity
  • kernel estimates of a density

2000 Mathematics Subject Classification

  • primary 62G07
  • secondary 62G05
  • 68T05
  • 62G20