Abstract
Probability density estimation is a core problem in statistics and data science. Moment methods are an important means of density estimation, but they are generally strongly dependent on the choice of feasible functions, which severely affects the performance. In this paper, we propose a non-classical parametrization for density estimation using sample moments, which does not require the choice of such functions. The parametrization is induced by the squared Hellinger distance, and the solution minimizing it, which is proved to exist and be unique subject to a simple prior that does not depend on data, and which can be obtained by convex optimization. Statistical properties of the density estimator, together with an asymptotic error upper bound, are proposed for the estimator by power moments. Simulation results validate the performance of the estimator by a comparison to several prevailing methods. The convergence rate of the proposed estimator is proved to be \(m^{-1/2}\) (m being the number of data samples), which is the optimal convergence rate for parametric estimators and exceeds that of the nonparametric estimators. To the best of our knowledge, the proposed estimator is the first one in the literature for which the power moments up to an arbitrary even order exactly match the sample moments, while the true density is not assumed to fall within specific function classes.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00362-024-01563-z/MediaObjects/362_2024_1563_Fig15_HTML.png)
Similar content being viewed by others
References
Abraham C, Biau G, Cadre B (2004) On the asymptotic properties of a simple estimate of the mode. Probab Stat 8:1–11
Altun Y, Smola A (2006) Unifying divergence minimization and statistical inference via convex duality. In: International conference on computational learning theory. Springer, New York, pp 139–153
Barndorff-Nielsen OE (2014) Information and exponential families in statistical theory. Wiley, Chichester
Bertsimas D, Popescu I (2005) Optimal inequalities in probability theory: a convex optimization approach. SIAM J Optim 15(3):780–804
Bunea F, Tsybakov AB, Wegkamp MH (2007) Sparse density estimation with \(l\)1 penalties. In: International conference on computational learning theory. Springer, New York, pp 530–543
Byrnes CI, Lindquist A (2006) The generalized moment problem with complexity constraint. Integr Eqn Oper Theory 56:163–180
Byrnes CI, Gusev SV, Lindquist A (1998) A convex optimization approach to the rational covariance extension problem. SIAM J Control Optim 37(1):211–229
Byrnes CI, Gusev SV, Lindquist A (2001) From finite covariance windows to modeling filters: a convex optimization approach. SIAM Rev 43(4):645–675
Cheng Y (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799
Chernoff H (1964) Estimation of the mode. Ann Inst Stat Math 16(1):31–41
Chung KL (2001) A course in probability theory. Academic Press, New York
Cutler A, Cordero-Brana OI (1996) Minimum Hellinger distance estimation for finite mixture models. J Am Stat Assoc 91(436):1716–1723
DasGupta A (2008) Asymptotic theory of statistics and probability, vol 180. Springer, New York
Dasgupta S, Kpotufe S (2014) Optimal rates for k-nn density and mode estimation. Adv Neural Inf Process Syst 27:2555–2563
Dudik M, Phillips SJ, Schapire RE (2004) Performance guarantees for regularized maximum entropy density estimation. In: International conference on computational learning theory. Springer, New York, pp 472–486
Eddy WF (1980) Optimum kernel estimators of the mode. Ann Stat 8(4):870–882
Ferrante A, Pavon M, Ramponi F (2008) Hellinger versus Kullback-Leibler multivariable spectrum approximation. IEEE Trans Autom Control 53(4):954–967
Genovese CR, Pacifico MP, Verdinelli I, Wasserman L et al (2012) Minimax manifold estimation. J Mach Learn Res 13:1263–1291
Georgiou TT, Lindquist A (2003) Kullback-leibler approximation of spectral density functions. IEEE Trans Inf Theory 49(11):2910–2917
Gordon L, Olshen RA (1984) Almost surely consistent nonparametric regression from recursive partitioning schemes. J Multivar Anal 15(2):147–163
Hall P (1987) On Kullback-Leibler loss and density estimation. Ann Stat 15:1491–1519
He X, Shi P (1994) Convergence rate of b-spline estimators of nonparametric conditional quantile functions. J Nonparametr Stat 3(3–4):299–308
Izenman AJ (1991) Review papers: recent developments in nonparametric density estimation. J Am Stat Assoc 86(413):205–224
Jiang H, Kpotufe S (2017) Modal-set estimation with an application to clustering. In: Artificial intelligence and statistics. PMLR, pp 1197–1206
Kapur JN, Kesavan HK (1992) Entropy optimization principles and their applications. In: Entropy and energy dissipation in water resources. Springer, Dordrecht, pp 3–20
Karlsson J, Lindquist A, Ringh A (2016) The multidimensional moment problem with complexity constraint. Integr Eqn Oper Theory 84(3):395–418
Kullback S (1970) Correction to a lower bound for discrimination information in terms of variation. IEEE Trans Inf Theory 16(5):652–652
Li JQ, Barron AR (1999) Mixture density estimation. NIPS 12:279–285
Lu Z, Hui YV, Lee AH (2003) Minimum Hellinger distance estimation for finite mixtures of Poisson regression models and its applications. Biometrics 59(4):1016–1026
McLachlan GJ, Basford KE (1988) Mixture models: inference and applications to clustering, vol 38. M. Dekker, New York
Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33(3):1065–1076
Rigollet P (2007) Generalization error bounds in semi-supervised classification under the cluster assumption. J Mach Learn Res 8(7):1369–1392
Schmüdgen K (2017) The moment problem. Graduate texts in mathematics, vol 277. Springer, New York
Silverman BW (2018) Density estimation for statistics and data analysis. Routledge, Boca Raton
Song L, Zhang X, Smola A, Gretton A, Schölkopf B (2008) Tailoring density estimation via reproducing kernel moment matching. In: Proceedings of the 25th international conference on machine learning, pp 992–999
Tagliani A (2003) A note on proximity of distributions in terms of coinciding moments. Appl Math Comput 145(2–3):195–203
Vapnik V (1999) The nature of statistical learning theory. Springer, New York
Wu G, Lindquist A (2023) Non-Gaussian Bayesian filtering by density parametrization using power moments. Automatica 153:111061
Young PC (2012) Recursive estimation and time-series analysis: an introduction. Springer, New York
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, G., Lindquist, A. A non-classical parameterization for density estimation using sample moments. Stat Papers (2024). https://doi.org/10.1007/s00362-024-01563-z
Received:
Revised:
Published:
DOI: https://doi.org/10.1007/s00362-024-01563-z