Skip to main content

Approximate Set Union via Approximate Randomization

  • Conference paper
  • First Online:
Computing and Combinatorics (COCOON 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12273))

Included in the following conference series:

  • 726 Accesses

Abstract

We develop a randomized approximation algorithm for the size of set union problem \(| A_1\cup A_2\cup \ldots \cup A_{m}|\), which is given a list of sets \(A_1,\,\ldots ,\,A_{m}\) with approximate set size \(m_i\) for \(A_i\) with \(m_i\in ((1-\beta _L)|A_i|\), \((1+\beta _R)|A_i|)\), and biased random generators with probability \(\mathrm{Prob}\left( x=\mathrm{RandomElement}(A_i)\right) \in \left[ {1-\alpha _L\over |A_i|},\, {1+\alpha _R\over |A_i|}\right] \) for each input set \(A_i\) and element \(x\in A_i,\) where \(i=1,\, 2,\, \ldots ,\,m\) and \(\alpha _L,\, \alpha _R,\, \beta _L,\,\beta _R\in (0,\,1)\). The approximation ratio for \(| A_1\cup A_2\cup \ldots \cup A_m|\) is in the range \([(1-\epsilon )(1-\alpha _L)(1-\beta _L),\, (1+\epsilon )(1+\alpha _R)\) \((1+\beta _R)]\) for any \(\epsilon \in (0,\,1).\) The complexity of the algorithm is measured by both time complexity and round complexity. One round of the algorithm has non-adaptive accesses to those \(\mathrm{RandomElement}(A_i) \) functions \(1\le i\le m\), and membership queries (\(x\in A_i\)?) to input sets \(A_i\) with \(1\le i\le m\). Our algorithm gives an approximation scheme with \({O}(m\cdot (\log m)^{7})\) running time and \({O}(\log m)\) rounds in contrast to the existing algorithm  [18] that needs \(\varOmega (m)\) rounds in the worst case with \({O}((1+\epsilon )m/\epsilon ^2)\) running time, where \(m\) is the number of sets. Our algorithm gives a flexible tradeoff with time complexity \({O}\left( m^{1+\xi }\right) \) and round complexity \({O}\left( {1\over \xi }\right) \) for any \(\xi \in (0,\,1)\). Our algorithm runs sublinear in time under certain condition that each element in \(A_1\cup A_2\cup \ldots \cup A_{m}\) belongs to \(m^a\) sets for any fixed \(a>0\), to our best knowledge, we have not seen any sublinear results about this problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, USA, 22–24 May 1996, pp. 20–29 (1996)

    Google Scholar 

  2. Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45726-7_1

    Chapter  MATH  Google Scholar 

  3. Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 6–8 January 2002, San Francisco, CA, USA, pp. 623–632 (2002)

    Google Scholar 

  4. Blasiok, J.: Optimal streaming and tracking distinct elements with high probability. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, 7–10 January 2018, pp. 2432–2448 (2018)

    Google Scholar 

  5. Bringmann, K., Friedrich, T.: Approximating the volume of unions and intersections of high-dimensional geometric objects. Comput. Geom. 43(6–7), 601–610 (2010)

    Article  MathSciNet  Google Scholar 

  6. Buss, S.R., Hay, L.: On truth-table reducibility to SAT and the difference hierarchy over NP. In: Proceedings: Third Annual Structure in Complexity Theory Conference, Georgetown University, Washington, D. C., USA, 14–17 June 1988, pp. 224–233 (1988)

    Google Scholar 

  7. Cook, S.A.: The complexity of theorem-proving procedures. In: Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, 3–5 May 1971, Shaker Heights, Ohio, USA, pp. 151–158 (1971)

    Google Scholar 

  8. Flajolet, P., Fusy, É., Gandoue, O., Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: 2007 Conference on Analysis of Algorithms, AofA 2007, pp. 127–146 (2007)

    Google Scholar 

  9. Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)

    Article  MathSciNet  Google Scholar 

  10. Fortnow, L., Reingold, N.: PP is closed under truth-table reductions. In: Proceedings of the Sixth Annual Structure in Complexity Theory Conference, Chicago, Illinois, USA, June 30–July 3 1991, pp. 13–15 (1991)

    Google Scholar 

  11. Ganguly, S., Garofalakis, M.N., Rastogi, R.: Tracking set-expression cardinalities over continuous update streams. VLDB J. 13(4), 354–369 (2004)

    Article  Google Scholar 

  12. Gibbons, P.B.: Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, 11–14 September 2001, Roma, Italy, pp. 541–550 (2001)

    Google Scholar 

  13. Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: SPAA, pp. 281–291 (2001)

    Google Scholar 

  14. Haas, P.J., Naughton, J.F., Seshadri, S., Stokes, L.: Sampling-based estimation of the number of distinct values of an attribute. In: VLDB 1995, Proceedings of 21th International Conference on Very Large Data Bases, 11–15 September 1995, Zurich, Switzerland, pp. 311–322 (1995)

    Google Scholar 

  15. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  Google Scholar 

  16. Huang, Z., Tai, W.M., Yi, K.: Tracking the frequency moments at all times. CoRR, abs/1412.1763 (2014)

    Google Scholar 

  17. Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, 6–11 June 2010, Indianapolis, Indiana, USA, pp. 41–52 (2010)

    Google Scholar 

  18. Karp, R.M., Luby, M., Madras, N.: Monte-carlo approximation algorithms for enumeration problems. J. Algorithms 10(3), 429–448 (1989)

    Article  MathSciNet  Google Scholar 

  19. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (2000)

    MATH  Google Scholar 

  20. Valiant, L.G.: The complexity of computing the permanent. Theor. Comput. Sci. 8, 189–201 (1979)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgement

This research is supported in part by National Science Foundation Early Career Award 0845376, Bensten Fellowship of the University of Texas Rio Grande Valley, and National Natural Science Foundation of China 61772179.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pengfei Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fu, B., Gu, P., Zhao, Y. (2020). Approximate Set Union via Approximate Randomization. In: Kim, D., Uma, R., Cai, Z., Lee, D. (eds) Computing and Combinatorics. COCOON 2020. Lecture Notes in Computer Science(), vol 12273. Springer, Cham. https://doi.org/10.1007/978-3-030-58150-3_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58150-3_48

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58149-7

  • Online ISBN: 978-3-030-58150-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics