Skip to main content

Estimating Sum by Weighted Sampling

  • Conference paper
Book cover Automata, Languages and Programming (ICALP 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4596))

Included in the following conference series:

Abstract

We study the classic problem of estimating the sum of n variables. The traditional uniform sampling approach requires a linear number of samples to provide any non-trivial guarantees on the estimated sum. In this paper we consider various sampling methods besides uniform sampling, in particular sampling a variable with probability proportional to its value, referred to as linear weighted sampling. If only linear weighted sampling is allowed, we show an algorithm for estimating sum with \(\tilde{O}(\sqrt n)\) samples, and it is almost optimal in the sense that \(\Omega(\sqrt n)\) samples are necessary for any reasonable sum estimator. If both uniform sampling and linear weighted sampling are allowed, we show a sum estimator with \(\tilde{O}(\sqrt[3]n)\) samples. More generally, we may allow general weighted sampling where the probability of sampling a variable is proportional to any function of its value. We prove a lower bound of \(\Omega(\sqrt[3]n)\) samples for any reasonable sum estimator using general weighted sampling, which implies that our algorithm combining uniform and linear weighted sampling is an almost optimal sum estimator.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, N., Duffield, N.G., Lund, C., Thorup, M.: Estimating arbitrary subset sums with few probes. In: PODS 2005

    Google Scholar 

  2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. JCSC 58, 137–147 (1999)

    MATH  MathSciNet  Google Scholar 

  3. Bar-Yossef, Z., Gurevich, M. (eds.): Random sampling from a search engine’s index. In: WWW 2006

    Google Scholar 

  4. Bar-Yossef, Z., Gurevich, M.: Efficient search engine measurements. In: WWW 2007

    Google Scholar 

  5. Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Sampling algorithms: lower bounds and applications. In: STOC 2001

    Google Scholar 

  6. Broder, A., Fontura, M., Josifovski, V., Kumar, R., Motwani, R., Nabar, S., Panigrahy, R., Tomkins, A., Xu, Y.: Estimating corpus size via queries. In: CIKM 2006

    Google Scholar 

  7. Canetti, R., Even, G., Goldreich, O.: Lower Bounds for Sampling Algorithms for Estimating the Average. Information Processing Letters 53, 17–25 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  8. Charikar, M., Chaudhuri, S., Motwani, R., Narasayya, V.: Towards estimation error guarantees for distinct values. In: PODS 2000

    Google Scholar 

  9. Duffield, N.G., Lund, C., Thorup, M.: Learn more, sample less: control of volume and variance in network measurements. IEEE Trans. on Information Theory 51, 1756–1775 (2005)

    Article  MathSciNet  Google Scholar 

  10. Gulli, A., Signorini, A.: The indexable Web is more than 11.5 billion pages. In: WWW 2005

    Google Scholar 

  11. Henzinger, M.R., Heydon, A., Mitzenmacher, M., Najork, M.: On near-uniform URL sampling. In: WWW 2000

    Google Scholar 

  12. Lawrence, S., Giles, C.: Searching the World Wide Web. Science 280, 98–100 (1998)

    Article  Google Scholar 

  13. Lawrence, S., Giles, C.: Accessibility of information on the web. Nature 400, 107–109 (1999)

    Article  Google Scholar 

  14. Liu, J.: Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Statist. Comput. 6, 113–119 (1996)

    Article  Google Scholar 

  15. Motwani, R., Raghavan, P.: Randomized Algorithm (1995)

    Google Scholar 

  16. Motwani, R., Raghavan, P., Xu, Y.: Estimating Sum by Weighted Sampling. Technical Report (2007)

    Google Scholar 

  17. Szegedy, M.: The DLT priority sampling is essentially optimal. In: STOC 2006

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Lars Arge Christian Cachin Tomasz Jurdziński Andrzej Tarlecki

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Motwani, R., Panigrahy, R., Xu, Y. (2007). Estimating Sum by Weighted Sampling. In: Arge, L., Cachin, C., Jurdziński, T., Tarlecki, A. (eds) Automata, Languages and Programming. ICALP 2007. Lecture Notes in Computer Science, vol 4596. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73420-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73420-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73419-2

  • Online ISBN: 978-3-540-73420-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics