Abstract
Density estimation raises delicate problems in higher dimensions especially when strong convergence is required and data marginals can be highly correlated. Modified histograms have been introduced to circumvent the problem of low bin counts when convergence is considered in the sense of information divergence. These estimates are defined from some reference probability density and an associated partition which is defined in the univariate case fromni the quantiles of the reference density. Therefore, in the multivariate case, the definition of the partition causes an additional probleni related to the lack of total order. In this paper, we present a method for constructing modified multivariate histograms such that the corresponding partition is well adapted to the observed data. The approach is based on a data-driven coordinate system selected by cross-validation. We discuss the performance of our estimate with the help of a finite sample sirnulation study.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Anderson, D.N. (1992). A multivariate Linnik distribution. Statistic & Probability Letters, 14:333–336.
Barron, A.R. (1988). The convergence in information of probability density estimators. In: Proceedings of the International Symposiumr of IEEE on Information Theory, Kobe, Japan.
Barron, A.R., Györfi, L., and van der Meulen, E.C. (1992). Distribution estimation consistent in total variation and in two types of information divergence. IEEE Transaction on Information Theory, 38:1437–1454.
Berlinet, A. and Biau, G. (2004). Iterated modified histograms as dynamical systems. Journal of Nonparametric Statistics, to appear.
Berlinet, A. and Brunel, E. (2004). Cross-validated density estimates based on Kullback-Leibler information. Journal of Nonparametric Statistics, to appear.
Berlinet, A., Györfi, L., and van der Meulen, E.C. (1997). The asymptotic normality of relative entropy in multivariate density estimation. Publications de l'Institut de Statistique de l'Université de Paris, 41:3–27.
Berlinet, A., Vajda, I., and van der Meulen, E.C. (1998). About the asymptotic accuracy of Barron density estimates. IEEE Transactions on Information Theory, 38:999–1009.
Brown, B.M. (1983). Statistical use of the spatial median. Journal of the Royal Statistical Society, Series B, 45:25–30.
Brown, B.M. and Hettmansperger, T. (1987). Affine invariant rank methods in the bivariate location model. Journal of the Royal Statistical Society, Series B, 49: 301–310.
Chakraborty, B. (2001). On affine equivariant multivariate quantiles. Annals of the Institute of Statistical Mathematics, 53:380–403.
Chaudhuri, P. (1996). On a geometric notion of quantiles for multivariate data. Journal of the American Statistical Association, 91:862–872.
Chaudhuri, P. and Sengupta, D. (1993). Sign tests in multidimension: inference based on the geometry of the data cloud. Journal of the American Statistical Association, 88:1363–1370.
Csizár, I. (1967). Information-type measures of divergence of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2:299–318.
Csizár, I. (1973). Generalized entropy and quantization problems. In: Transactions of the Sixth Prague Conference on Information Theory, Statistical Decision Functions, Random Process, pages 159–174, Prague, Academia.
Devroye, L. (1983). On arbitrary slow rates of global convergence in density estimation. Zeitschrift für Warscheinlichtkeitstheorie und vermandte Gebiete, 62:475–483. Leipzig.
Eddy, W.F. (1983). Set valued ordering of bivariate data. In: Ambartsumian, R., and Weil, W. (eds.), Stochastic Geometry, Geometric Statistics, and Stereology, pages 79–90. Leipzig.
Eddy, W.F. (1985). Ordering of multivariate data. In: Billard, L. (ed.), Computer Science and Statistics: The Interface, pages 25–30. North-Holland, Amsterdam.
Györfi, L., Liese, F., Vajda, I., and van der Meulen, E.C. (1998). Distribution estimates consistent in χ2-divergence. Statistics, 32:31–57.
Hall, P. (1987). On Kullback-Leibler loss and density estimation. The Annals of Statistics, 15:1491–1519.
Kemperman, J.H.B. (1969). On the optimum rate of transmitting information. The Annals of Mathematical Statistics, 40:2156–2177.
Kemperman, J.H.B. (1987). The median of a finite measure on a Banach space. In: Dodge, Y. (ed.), Statistical Data Analysis Based on L 1 norm and Related Methods, pages 217–230, Amsterdam North-Holland.
Kullback, S. (1967). A lower bound for discrimination in terms of variation. IEEE Transactions on Information Theory, 13:126–127.
Liu, R.Y., Parelius, J.M., and Singh, K. (1999). Multivariate analysis by data depth: descriptive statistics, graphics and inference (with discussion). The Annals of Statistics, 18:783–858.
Stone, C.J. (1985). An asymptotically optimal histogram selection rule. In: Cam, L.L., and Olshen, R.A. (eds.), Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, volume 2, pages 513–520, Wadsworth, Belmont, CA.
Tukey, J.W. (1975). Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians, volume 2, pages 523–531, Vancouver 1974.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer Science+Business Media, Inc.
About this chapter
Cite this chapter
Berlinet, A., Rouvière, L. (2005). Effective Construction of Modified Histograms in Higher Dimensions. In: Duchesne, P., RÉMillard, B. (eds) Statistical Modeling and Analysis for Complex Data Problems. Springer, Boston, MA. https://doi.org/10.1007/0-387-24555-3_6
Download citation
DOI: https://doi.org/10.1007/0-387-24555-3_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-24554-6
Online ISBN: 978-0-387-24555-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)