Advertisement

Statistics and Computing

, Volume 13, Issue 2, pp 91–100 | Cite as

Single-pass low-storage arbitrary quantile estimation for massive datasets

  • John C. Liechty
  • Dennis K. J. Lin
  • James P. McDermott
Article

Abstract

We present a single-pass, low-storage, sequential method for estimating an arbitrary quantile of an unknown distribution. The proposed method performs very well when compared to existing methods for estimating the median as well as arbitrary quantiles for a wide range of densities. In addition to explaining the method and presenting the results of the simulation study, we discuss intuition behind the method and demonstrate empirically, for certain densities, that the proposed estimator converges to the sample quantile.

low-storage quantile estimation single-pass algorithms data mining large datasets tail quantile 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bahadur R.R. 1966. A note on quantiles in large samples. Annals of Mathematical Statistics 37: 577–580.Google Scholar
  2. Breiman L., Gins J., and Stone C. 1979. New Methods for Estimating Tail Probabilities and Extreme Value Distributions. Technology Service Corp. Santa Monica, CA, TSC-PD-A2261.Google Scholar
  3. Chao M.T. and Lin G.D. 1993. The asympotic distribution of the remedians. Journal of Statistical Planning and Inference 37: 1–11.Google Scholar
  4. Dunn C.L. 1991. Precise simulated percentiles in a pinch. The American Statistician 45(3): 207–211.Google Scholar
  5. Hurley C. and Modarres R. 1995. Low-Storage quantile estimation. Computational Statistics 10(4): 311–325.Google Scholar
  6. Kesidis G. 1999. Bandwidth adjustments using on-line packet-level adjustments. In: SPIE Conference on Performance and Control of Network Systems, Boston, Sept. 19-22.Google Scholar
  7. Krutchkoff R.G. 1986. Percentiles by simulation: Reducing time and storage. Journal of Statistical Computation and Simulation 25: 304–305.Google Scholar
  8. Manku G.S., Rajagopalan S., and Lindsay B.G. 1998. Approximate medians and other quantiles in one pass and with limited memory. In: Proc. ACM SIGMOD International Conf. on Management of Data June, pp. 426–435.Google Scholar
  9. Pearl J. 1981. A space-efficient on-line method of computing quantile estimates. Journal of Algorithms 2: 164–177.Google Scholar
  10. Ott W.R. 1995. Environmental Statistics and Data Analysis. Lewis Publishers.Google Scholar
  11. Pfanzagl J. 1974. Investigating the quantile of an unknown distribution. Contributions to Applied Statistics, Ziegler W.J. (Ed.), Birkhauser Verlag, Basel, pp. 111–126.Google Scholar
  12. Rousseeuw P.J. and Bassett G.W. 1990. The remedian: A robust averaging method for large datasets. Journal of the American Statistical Association 85(409): 97–104.Google Scholar
  13. Serfling R.J. 1980. Approximation Theorems of Mathematical Statistics. Wiley, New York.Google Scholar
  14. Tierney L. 1983. A space-efficient recursive procedure for estimating a quantile of an unknown distribution. SIAM Journal on Scientific and Statistical Computing 4(4): 706–711.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • John C. Liechty
    • 1
  • Dennis K. J. Lin
    • 2
  • James P. McDermott
    • 3
  1. 1.Department of MarketingPennsylvania State UniversityUniversity ParkUSA
  2. 2.Department of Supply Chain and Information SystemsUniversity ParkUSA
  3. 3.Department of StatisticsPennsylvania State UniversityUniversity ParkUSA

Personalised recommendations