Skip to main content

Effective Construction of Modified Histograms in Higher Dimensions

  • Chapter
Statistical Modeling and Analysis for Complex Data Problems
  • 2043 Accesses

Abstract

Density estimation raises delicate problems in higher dimensions especially when strong convergence is required and data marginals can be highly correlated. Modified histograms have been introduced to circumvent the problem of low bin counts when convergence is considered in the sense of information divergence. These estimates are defined from some reference probability density and an associated partition which is defined in the univariate case fromni the quantiles of the reference density. Therefore, in the multivariate case, the definition of the partition causes an additional probleni related to the lack of total order. In this paper, we present a method for constructing modified multivariate histograms such that the corresponding partition is well adapted to the observed data. The approach is based on a data-driven coordinate system selected by cross-validation. We discuss the performance of our estimate with the help of a finite sample sirnulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Anderson, D.N. (1992). A multivariate Linnik distribution. Statistic & Probability Letters, 14:333–336.

    Article  MATH  MathSciNet  Google Scholar 

  • Barron, A.R. (1988). The convergence in information of probability density estimators. In: Proceedings of the International Symposiumr of IEEE on Information Theory, Kobe, Japan.

    Google Scholar 

  • Barron, A.R., Györfi, L., and van der Meulen, E.C. (1992). Distribution estimation consistent in total variation and in two types of information divergence. IEEE Transaction on Information Theory, 38:1437–1454.

    Article  Google Scholar 

  • Berlinet, A. and Biau, G. (2004). Iterated modified histograms as dynamical systems. Journal of Nonparametric Statistics, to appear.

    Google Scholar 

  • Berlinet, A. and Brunel, E. (2004). Cross-validated density estimates based on Kullback-Leibler information. Journal of Nonparametric Statistics, to appear.

    Google Scholar 

  • Berlinet, A., Györfi, L., and van der Meulen, E.C. (1997). The asymptotic normality of relative entropy in multivariate density estimation. Publications de l'Institut de Statistique de l'Université de Paris, 41:3–27.

    Google Scholar 

  • Berlinet, A., Vajda, I., and van der Meulen, E.C. (1998). About the asymptotic accuracy of Barron density estimates. IEEE Transactions on Information Theory, 38:999–1009.

    Article  Google Scholar 

  • Brown, B.M. (1983). Statistical use of the spatial median. Journal of the Royal Statistical Society, Series B, 45:25–30.

    MATH  Google Scholar 

  • Brown, B.M. and Hettmansperger, T. (1987). Affine invariant rank methods in the bivariate location model. Journal of the Royal Statistical Society, Series B, 49: 301–310.

    MathSciNet  Google Scholar 

  • Chakraborty, B. (2001). On affine equivariant multivariate quantiles. Annals of the Institute of Statistical Mathematics, 53:380–403.

    Article  MATH  MathSciNet  Google Scholar 

  • Chaudhuri, P. (1996). On a geometric notion of quantiles for multivariate data. Journal of the American Statistical Association, 91:862–872.

    Article  MATH  MathSciNet  Google Scholar 

  • Chaudhuri, P. and Sengupta, D. (1993). Sign tests in multidimension: inference based on the geometry of the data cloud. Journal of the American Statistical Association, 88:1363–1370.

    Article  MathSciNet  Google Scholar 

  • Csizár, I. (1967). Information-type measures of divergence of probability distributions and indirect observations. Studia Scientiarum Mathematicarum Hungarica, 2:299–318.

    MathSciNet  Google Scholar 

  • Csizár, I. (1973). Generalized entropy and quantization problems. In: Transactions of the Sixth Prague Conference on Information Theory, Statistical Decision Functions, Random Process, pages 159–174, Prague, Academia.

    Google Scholar 

  • Devroye, L. (1983). On arbitrary slow rates of global convergence in density estimation. Zeitschrift für Warscheinlichtkeitstheorie und vermandte Gebiete, 62:475–483. Leipzig.

    Article  MATH  MathSciNet  Google Scholar 

  • Eddy, W.F. (1983). Set valued ordering of bivariate data. In: Ambartsumian, R., and Weil, W. (eds.), Stochastic Geometry, Geometric Statistics, and Stereology, pages 79–90. Leipzig.

    Google Scholar 

  • Eddy, W.F. (1985). Ordering of multivariate data. In: Billard, L. (ed.), Computer Science and Statistics: The Interface, pages 25–30. North-Holland, Amsterdam.

    Google Scholar 

  • Györfi, L., Liese, F., Vajda, I., and van der Meulen, E.C. (1998). Distribution estimates consistent in χ2-divergence. Statistics, 32:31–57.

    MathSciNet  Google Scholar 

  • Hall, P. (1987). On Kullback-Leibler loss and density estimation. The Annals of Statistics, 15:1491–1519.

    MATH  MathSciNet  Google Scholar 

  • Kemperman, J.H.B. (1969). On the optimum rate of transmitting information. The Annals of Mathematical Statistics, 40:2156–2177.

    MATH  MathSciNet  Google Scholar 

  • Kemperman, J.H.B. (1987). The median of a finite measure on a Banach space. In: Dodge, Y. (ed.), Statistical Data Analysis Based on L 1 norm and Related Methods, pages 217–230, Amsterdam North-Holland.

    Google Scholar 

  • Kullback, S. (1967). A lower bound for discrimination in terms of variation. IEEE Transactions on Information Theory, 13:126–127.

    Article  Google Scholar 

  • Liu, R.Y., Parelius, J.M., and Singh, K. (1999). Multivariate analysis by data depth: descriptive statistics, graphics and inference (with discussion). The Annals of Statistics, 18:783–858.

    MathSciNet  Google Scholar 

  • Stone, C.J. (1985). An asymptotically optimal histogram selection rule. In: Cam, L.L., and Olshen, R.A. (eds.), Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, volume 2, pages 513–520, Wadsworth, Belmont, CA.

    Google Scholar 

  • Tukey, J.W. (1975). Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians, volume 2, pages 523–531, Vancouver 1974.

    MATH  MathSciNet  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this chapter

Cite this chapter

Berlinet, A., Rouvière, L. (2005). Effective Construction of Modified Histograms in Higher Dimensions. In: Duchesne, P., RÉMillard, B. (eds) Statistical Modeling and Analysis for Complex Data Problems. Springer, Boston, MA. https://doi.org/10.1007/0-387-24555-3_6

Download citation

Publish with us

Policies and ethics