Abstract
We develop a randomized approximation algorithm for the size of set union problem \(| A_1\cup A_2\cup \ldots \cup A_{m}|\), which is given a list of sets \(A_1,\,\ldots ,\,A_{m}\) with approximate set size \(m_i\) for \(A_i\) with \(m_i\in ((1-\beta _L)|A_i|\), \((1+\beta _R)|A_i|)\), and biased random generators with probability \(\mathrm{Prob}\left( x=\mathrm{RandomElement}(A_i)\right) \in \left[ {1-\alpha _L\over |A_i|},\, {1+\alpha _R\over |A_i|}\right] \) for each input set \(A_i\) and element \(x\in A_i,\) where \(i=1,\, 2,\, \ldots ,\,m\) and \(\alpha _L,\, \alpha _R,\, \beta _L,\,\beta _R\in (0,\,1)\). The approximation ratio for \(| A_1\cup A_2\cup \ldots \cup A_m|\) is in the range \([(1-\epsilon )(1-\alpha _L)(1-\beta _L),\, (1+\epsilon )(1+\alpha _R)\) \((1+\beta _R)]\) for any \(\epsilon \in (0,\,1).\) The complexity of the algorithm is measured by both time complexity and round complexity. One round of the algorithm has non-adaptive accesses to those \(\mathrm{RandomElement}(A_i) \) functions \(1\le i\le m\), and membership queries (\(x\in A_i\)?) to input sets \(A_i\) with \(1\le i\le m\). Our algorithm gives an approximation scheme with \({O}(m\cdot (\log m)^{7})\) running time and \({O}(\log m)\) rounds in contrast to the existing algorithm [18] that needs \(\varOmega (m)\) rounds in the worst case with \({O}((1+\epsilon )m/\epsilon ^2)\) running time, where \(m\) is the number of sets. Our algorithm gives a flexible tradeoff with time complexity \({O}\left( m^{1+\xi }\right) \) and round complexity \({O}\left( {1\over \xi }\right) \) for any \(\xi \in (0,\,1)\). Our algorithm runs sublinear in time under certain condition that each element in \(A_1\cup A_2\cup \ldots \cup A_{m}\) belongs to \(m^a\) sets for any fixed \(a>0\), to our best knowledge, we have not seen any sublinear results about this problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. In: Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, USA, 22–24 May 1996, pp. 20–29 (1996)
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting distinct elements in a data stream. In: Rolim, J.D.P., Vadhan, S. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45726-7_1
Bar-Yossef, Z., Kumar, R., Sivakumar, D.: Reductions in streaming algorithms, with an application to counting triangles in graphs. In: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 6–8 January 2002, San Francisco, CA, USA, pp. 623–632 (2002)
Blasiok, J.: Optimal streaming and tracking distinct elements with high probability. In: Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, 7–10 January 2018, pp. 2432–2448 (2018)
Bringmann, K., Friedrich, T.: Approximating the volume of unions and intersections of high-dimensional geometric objects. Comput. Geom. 43(6–7), 601–610 (2010)
Buss, S.R., Hay, L.: On truth-table reducibility to SAT and the difference hierarchy over NP. In: Proceedings: Third Annual Structure in Complexity Theory Conference, Georgetown University, Washington, D. C., USA, 14–17 June 1988, pp. 224–233 (1988)
Cook, S.A.: The complexity of theorem-proving procedures. In: Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, 3–5 May 1971, Shaker Heights, Ohio, USA, pp. 151–158 (1971)
Flajolet, P., Fusy, É., Gandoue, O., Meunier, F.: HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In: 2007 Conference on Analysis of Algorithms, AofA 2007, pp. 127–146 (2007)
Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)
Fortnow, L., Reingold, N.: PP is closed under truth-table reductions. In: Proceedings of the Sixth Annual Structure in Complexity Theory Conference, Chicago, Illinois, USA, June 30–July 3 1991, pp. 13–15 (1991)
Ganguly, S., Garofalakis, M.N., Rastogi, R.: Tracking set-expression cardinalities over continuous update streams. VLDB J. 13(4), 354–369 (2004)
Gibbons, P.B.: Distinct sampling for highly-accurate answers to distinct values queries and event reports. In: VLDB 2001, Proceedings of 27th International Conference on Very Large Data Bases, 11–14 September 2001, Roma, Italy, pp. 541–550 (2001)
Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: SPAA, pp. 281–291 (2001)
Haas, P.J., Naughton, J.F., Seshadri, S., Stokes, L.: Sampling-based estimation of the number of distinct values of an attribute. In: VLDB 1995, Proceedings of 21th International Conference on Very Large Data Bases, 11–15 September 1995, Zurich, Switzerland, pp. 311–322 (1995)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Huang, Z., Tai, W.M., Yi, K.: Tracking the frequency moments at all times. CoRR, abs/1412.1763 (2014)
Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, 6–11 June 2010, Indianapolis, Indiana, USA, pp. 41–52 (2010)
Karp, R.M., Luby, M., Madras, N.: Monte-carlo approximation algorithms for enumeration problems. J. Algorithms 10(3), 429–448 (1989)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (2000)
Valiant, L.G.: The complexity of computing the permanent. Theor. Comput. Sci. 8, 189–201 (1979)
Acknowledgement
This research is supported in part by National Science Foundation Early Career Award 0845376, Bensten Fellowship of the University of Texas Rio Grande Valley, and National Natural Science Foundation of China 61772179.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Fu, B., Gu, P., Zhao, Y. (2020). Approximate Set Union via Approximate Randomization. In: Kim, D., Uma, R., Cai, Z., Lee, D. (eds) Computing and Combinatorics. COCOON 2020. Lecture Notes in Computer Science(), vol 12273. Springer, Cham. https://doi.org/10.1007/978-3-030-58150-3_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-58150-3_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58149-7
Online ISBN: 978-3-030-58150-3
eBook Packages: Computer ScienceComputer Science (R0)