Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
  1. Home
  2. Probability Theory and Related Fields
  3. Article
Data compression and histograms
Download PDF
Download PDF
  • Published: June 1992

Data compression and histograms

  • Bin Yu1 &
  • T. P. Speed2 

Probability Theory and Related Fields volume 92, pages 195–229 (1992)Cite this article

  • 202 Accesses

  • 16 Citations

  • Metrics details

Summary

In this paper, the relationship between code length and the selection of the number of bins for a histogram density is considered for a sequence of iid observations on [0,1]. First, we use a shortest code length criterion to select the number of bins for a histogram. A uniform almost sure asymptotic expansion for the code length is given and it is used to prove the asymptotic optimality of the selection rule. In addition, the selection rule is consistent if the true density is uniform [0,1]. Secondly, we deal with the problem: what is the “best” achievable average code length with underlying density functionf? Minimax lower bounds are derived for the average code length over certain smooth classes of underlying densitiesf. For the smooth class with bounded first derivatives, the rate in the lower bound is shown to be achieved by a code based on a sequence of histograms whose number of bins is changed predictively. Moreover, this best code can be modified to ensure that the almost sure version of the code length has asymptotically the same behavior as its expected value, i.e., the average code length.

Download to read the full article text

Working on a manuscript?

Avoid the common mistakes

References

  • Assouad, P.: Deux remarques sur l'estimation. Compt. Rendus de l'Academie Sci. Paris296, 1021–1024 (1983)

    Google Scholar 

  • Barron, A.R., Cover, T.M.: Minimum complexity density estimation. IEEE Trans. Inf. Theory IT-37, 1034–1054 (1991)

    Google Scholar 

  • Birgé, L.: Approximations dans les espaces metriques et theorie de l'estimation. Z. Wahrscheinlichkeitstheor. Verw. Geb.65, 181–237 (1983)

    Google Scholar 

  • Breiman, L.A., Freedman, D.F.: How many variables should be entered in a regression equation? J. Am. Stat. Assoc.78, 131–136 (1983)

    Google Scholar 

  • Bretagnolle, J., Huber, C.: Estimation des densities: risque minimax. Z. Wahrscheinlichkeitsther. Verw. Geb.47, 119–137 (1979)

    Google Scholar 

  • Clarke, B.S.: Asymptotic cumulative risk and bayes risk under entropy, with applications. PhD thesis, University of Illinois at Urbana-Champaign, 1989

  • Davisson, L.D.: Minimax noiseless universal coding for Markov sources. IEEE Trans. Inf. Theory29, 211–215 (1983)

    Google Scholar 

  • Dawid, A.P.: Present position and potential developments: some personal views, statistical theory, the prequential approach. J. R. Stat. Soc. Ser.B 147, 278–292 (1984)

    Google Scholar 

  • Dawid, A.P.: Prequential data analysis. In: Ghosh, M., Pathak, P.K. (eds.) Issues and controversies in statistical inference. Essays in Honor of D. Basu's 65th birthday. (to appear)

  • Devroye, L.: A course in density estimation. Progress in probability and statistics, vol. 14. Basel: Birkhauser 1987

    Google Scholar 

  • Donoho, D., Lui, R., MacGibbon, B.: Minimax risk over hyperrectangles and implications. Ann. Stat.18, 1416–1437 (1990)

    Google Scholar 

  • Freedman, D.A., Diaconis, P.: On the histogram as a density estimator: L2 theory. Z. Wahrscheinlichkeitstheor. Verw. Geb.57, 453–475 (1981)

    Google Scholar 

  • Hall, P., Hannan, E.J.: On stochastic complexity and nonparametric density estimation. Biometrika74, 705–714 (1988)

    Google Scholar 

  • Hamming, R.W.: Coding and information theory. Englewood Cliffs, N.J.: Prentice-Hall 1986

    Google Scholar 

  • Hannan, E.J., Cameron, M.A., Speed, T.P.: Estimating spectra and prediction variance (manuscript, 1991)

  • Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat.11, 416–431 (1983)

    Google Scholar 

  • Rissanen, J.: Stochastic complexity and modeling. Ann. Stat.14, 1080–1100 (1986)

    Google Scholar 

  • Rissanen, J.: Stochastic complexity in statistical inquiry. Singapore: World Scientific 1989

    Google Scholar 

  • Rissanen, J., Speed, T.P., Yu, B.: Density estimation by stochastic complexity. IEEE Trans. Inf. Theory (to appear 1992)

  • Speed, T.P., Yu, B.: Model selection and prediction: Normal regression. Ann. Inst. Stat. Math. (submitted for publication)

  • Stone, C.J.: Optimal uniform rate of convergence for nonparametric estimators of a density function or its derivatives. Recent advances in statistics, pp. 393–406. New York: Academic Press 1983

    Google Scholar 

  • Stone, C.J.: An asymptotic optimal histogram selection rule. Le Cam, L.M., Ohshen, R.A. (eds.) Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, vol.II, pp. 513–520. Belmont, CA: Wadsworth 1985

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Department of Statistics, University of Wisconsin, 53706, Madison, WI, USA

    Bin Yu

  2. Department of Statisties, University of California, 94720, Berkeley, CA, USA

    T. P. Speed

Authors
  1. Bin Yu
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. T. P. Speed
    View author publications

    You can also search for this author in PubMed Google Scholar

Additional information

Research supported in part by NSF grant DMS-8701426

Research supported in part by NSF grant DMS-8802378

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Yu, B., Speed, T.P. Data compression and histograms. Probab. Th. Rel. Fields 92, 195–229 (1992). https://doi.org/10.1007/BF01194921

Download citation

  • Received: 23 August 1990

  • Revised: 07 October 1991

  • Issue Date: June 1992

  • DOI: https://doi.org/10.1007/BF01194921

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

AMS 1980 Classifications

  • 60G05
  • 94A99
  • 94A17
Download PDF

Working on a manuscript?

Avoid the common mistakes

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature