Skip to main content
Log in

A non-classical parameterization for density estimation using sample moments

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

Probability density estimation is a core problem in statistics and data science. Moment methods are an important means of density estimation, but they are generally strongly dependent on the choice of feasible functions, which severely affects the performance. In this paper, we propose a non-classical parametrization for density estimation using sample moments, which does not require the choice of such functions. The parametrization is induced by the squared Hellinger distance, and the solution minimizing it, which is proved to exist and be unique subject to a simple prior that does not depend on data, and which can be obtained by convex optimization. Statistical properties of the density estimator, together with an asymptotic error upper bound, are proposed for the estimator by power moments. Simulation results validate the performance of the estimator by a comparison to several prevailing methods. The convergence rate of the proposed estimator is proved to be \(m^{-1/2}\) (m being the number of data samples), which is the optimal convergence rate for parametric estimators and exceeds that of the nonparametric estimators. To the best of our knowledge, the proposed estimator is the first one in the literature for which the power moments up to an arbitrary even order exactly match the sample moments, while the true density is not assumed to fall within specific function classes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Abraham C, Biau G, Cadre B (2004) On the asymptotic properties of a simple estimate of the mode. Probab Stat 8:1–11

    Article  MathSciNet  Google Scholar 

  • Altun Y, Smola A (2006) Unifying divergence minimization and statistical inference via convex duality. In: International conference on computational learning theory. Springer, New York, pp 139–153

  • Barndorff-Nielsen OE (2014) Information and exponential families in statistical theory. Wiley, Chichester

    Book  Google Scholar 

  • Bertsimas D, Popescu I (2005) Optimal inequalities in probability theory: a convex optimization approach. SIAM J Optim 15(3):780–804

    Article  MathSciNet  Google Scholar 

  • Bunea F, Tsybakov AB, Wegkamp MH (2007) Sparse density estimation with \(l\)1 penalties. In: International conference on computational learning theory. Springer, New York, pp 530–543

  • Byrnes CI, Lindquist A (2006) The generalized moment problem with complexity constraint. Integr Eqn Oper Theory 56:163–180

    Article  MathSciNet  Google Scholar 

  • Byrnes CI, Gusev SV, Lindquist A (1998) A convex optimization approach to the rational covariance extension problem. SIAM J Control Optim 37(1):211–229

    Article  MathSciNet  Google Scholar 

  • Byrnes CI, Gusev SV, Lindquist A (2001) From finite covariance windows to modeling filters: a convex optimization approach. SIAM Rev 43(4):645–675

    Article  MathSciNet  Google Scholar 

  • Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799

    Article  Google Scholar 

  • Chernoff H (1964) Estimation of the mode. Ann Inst Stat Math 16(1):31–41

    Article  MathSciNet  Google Scholar 

  • Chung KL (2001) A course in probability theory. Academic Press, New York

    Google Scholar 

  • Cutler A, Cordero-Brana OI (1996) Minimum Hellinger distance estimation for finite mixture models. J Am Stat Assoc 91(436):1716–1723

    Article  MathSciNet  Google Scholar 

  • DasGupta A (2008) Asymptotic theory of statistics and probability, vol 180. Springer, New York

    Google Scholar 

  • Dasgupta S, Kpotufe S (2014) Optimal rates for k-nn density and mode estimation. Adv Neural Inf Process Syst 27:2555–2563

    Google Scholar 

  • Dudik M, Phillips SJ, Schapire RE (2004) Performance guarantees for regularized maximum entropy density estimation. In: International conference on computational learning theory. Springer, New York, pp 472–486

  • Eddy WF (1980) Optimum kernel estimators of the mode. Ann Stat 8(4):870–882

    Article  MathSciNet  Google Scholar 

  • Ferrante A, Pavon M, Ramponi F (2008) Hellinger versus Kullback-Leibler multivariable spectrum approximation. IEEE Trans Autom Control 53(4):954–967

    Article  MathSciNet  Google Scholar 

  • Genovese CR, Pacifico MP, Verdinelli I, Wasserman L et al (2012) Minimax manifold estimation. J Mach Learn Res 13:1263–1291

    MathSciNet  Google Scholar 

  • Georgiou TT, Lindquist A (2003) Kullback-leibler approximation of spectral density functions. IEEE Trans Inf Theory 49(11):2910–2917

    Article  MathSciNet  Google Scholar 

  • Gordon L, Olshen RA (1984) Almost surely consistent nonparametric regression from recursive partitioning schemes. J Multivar Anal 15(2):147–163

    Article  MathSciNet  Google Scholar 

  • Hall P (1987) On Kullback-Leibler loss and density estimation. Ann Stat 15:1491–1519

    Article  MathSciNet  Google Scholar 

  • He X, Shi P (1994) Convergence rate of b-spline estimators of nonparametric conditional quantile functions. J Nonparametr Stat 3(3–4):299–308

    Article  MathSciNet  Google Scholar 

  • Izenman AJ (1991) Review papers: recent developments in nonparametric density estimation. J Am Stat Assoc 86(413):205–224

    MathSciNet  Google Scholar 

  • Jiang H, Kpotufe S (2017) Modal-set estimation with an application to clustering. In: Artificial intelligence and statistics. PMLR, pp 1197–1206

  • Kapur JN, Kesavan HK (1992) Entropy optimization principles and their applications. In: Entropy and energy dissipation in water resources. Springer, Dordrecht, pp 3–20

  • Karlsson J, Lindquist A, Ringh A (2016) The multidimensional moment problem with complexity constraint. Integr Eqn Oper Theory 84(3):395–418

    Article  MathSciNet  Google Scholar 

  • Kullback S (1970) Correction to a lower bound for discrimination information in terms of variation. IEEE Trans Inf Theory 16(5):652–652

    Article  Google Scholar 

  • Li JQ, Barron AR (1999) Mixture density estimation. NIPS 12:279–285

    Google Scholar 

  • Lu Z, Hui YV, Lee AH (2003) Minimum Hellinger distance estimation for finite mixtures of Poisson regression models and its applications. Biometrics 59(4):1016–1026

    Article  MathSciNet  Google Scholar 

  • McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering, vol 38. M. Dekker, New York

    Google Scholar 

  • Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33(3):1065–1076

    Article  MathSciNet  Google Scholar 

  • Rigollet P (2007) Generalization error bounds in semi-supervised classification under the cluster assumption. J Mach Learn Res 8(7):1369–1392

    MathSciNet  Google Scholar 

  • Schmüdgen K (2017) The moment problem. Graduate texts in mathematics, vol 277. Springer, New York

    Google Scholar 

  • Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, Boca Raton

    Book  Google Scholar 

  • Song L, Zhang X, Smola A, Gretton A, Schölkopf B (2008) Tailoring density estimation via reproducing kernel moment matching. In: Proceedings of the 25th international conference on machine learning, pp 992–999

  • Tagliani A (2003) A note on proximity of distributions in terms of coinciding moments. Appl Math Comput 145(2–3):195–203

    MathSciNet  Google Scholar 

  • Vapnik V (1999) The nature of statistical learning theory. Springer, New York

    Google Scholar 

  • Wu G, Lindquist A (2023) Non-Gaussian Bayesian filtering by density parametrization using power moments. Automatica 153:111061

    Article  MathSciNet  Google Scholar 

  • Young PC (2012) Recursive estimation and time-series analysis: an introduction. Springer, New York

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guangyu Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, G., Lindquist, A. A non-classical parameterization for density estimation using sample moments. Stat Papers (2024). https://doi.org/10.1007/s00362-024-01563-z

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI: https://doi.org/10.1007/s00362-024-01563-z

Keywords

Navigation