ICALP 2009: Automata, Languages and Programming pp 328-338

# Graph Sparsification in the Semi-streaming Model

• Kook Jin Ahn
• Sudipto Guha
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5556)

## Abstract

Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existence of linear space lower bounds for even simple problems such as determining the connectedness of a graph. However, in many new scenarios that arise from social and other interaction networks, the number of vertices is significantly less than the number of edges. This has led to the formulation of the semi-streaming model where we assume that the space is (near) linear in the number of vertices (but not necessarily the edges), and the edges appear in an arbitrary (and possibly adversarial) order.

However there has been limited progress in analysing graph algorithms in this model. In this paper we focus on graph sparsification, which is one of the major building blocks in a variety of graph algorithms. Further, there has been a long history of (non-streaming) sampling algorithms that provide sparse graph approximations and it a natural question to ask: since the end result of the sparse approximation is a small (linear) space structure, can we achieve that using a small space, and in addition using a single pass over the data? The question is interesting from the standpoint of both theory and practice and we answer the question in the affirmative, by providing a one pass $$\tilde{O}(n/\epsilon^{2})$$ space algorithm that produces a sparsification that approximates each cut to a (1 + ε) factor. We also show that $$\Omega(n \log \frac1\epsilon)$$ space is necessary for a one pass streaming algorithm to approximate the min-cut, improving upon the Ω(n) lower bound that arises from lower bounds for testing connectivity.

## Keywords

Strong Connected Component Strong Connectivity Major Building Block Streaming Algorithm Streaming Model
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## References

1. 1.
Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999)
2. 2.
Benczúr, A.A., Karger, D.R.: Approximating s-t minimum cuts in O(n2) time. In: STOC 1996: Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pp. 47–55. ACM, New York (1996)
3. 3.
Chekuri, C.S., Goldberg, A.V., Karger, D.R., Levine, M.S., Stein, C.: Experimental study of minimum cut algorithms. In: SODA 1997: Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 324–333. Society for Industrial and Applied Mathematics (1997)Google Scholar
4. 4.
Chernoff, H.: A measure of the asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics 23, 493–509 (1952)
5. 5.
Demetrescu, C., Finocchi, I., Ribichini, A.: Trading off space for passes in graph streaming problems. In: SODA, pp. 714–723 (2006)Google Scholar
6. 6.
Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: On graph problems in a semi-streaming model. Theor. Comput. Sci. 348(2), 207–216 (2005)
7. 7.
Gomory, R.E., Hu, T.C.: Multi-terminal network flows. J. Soc. Indust. Appl. Math. 9(4), 551–570 (1961)
8. 8.
Hao, J., Orlin, J.B.: A faster algorithm for finding the minimum cut in a graph. In: SODA 1992: Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 165–174. Society for Industrial and Applied Mathematics (1992)Google Scholar
9. 9.
Henzinger, M., Raghavan, P., Rajagopalan, S.: Computing on data streams (1998)Google Scholar
10. 10.
Karger, D.R.: Global min-cuts in rnc, and other ramifications of a simple min-out algorithm. In: SODA 1993: Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 21–30. Society for Industrial and Applied Mathematics (1993)Google Scholar
11. 11.
Karger, D.R.: Random sampling in cut, flow, and network design problems. In: STOC 1994: Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pp. 648–657. ACM Press, New York (1994)
12. 12.
Karger, D.R.: Minimum cuts in near-linear time. J. ACM 47(1), 46–76 (2000)
13. 13.
Karger, D.R., Stein, C.: A new approach to the minimum cut problem. J. ACM 43(4), 601–640 (1996)
14. 14.
McGregor, A.: Finding Graph Matchings in Data Streams. In: Proc. of APPROX-RANDOM, pp. 170–181 (2005)Google Scholar
15. 15.
Muthukrishnan, S.: Data streams: Algorithms and Applications. Now publishers (2006)Google Scholar
16. 16.
Ian Munro, J., Paterson, M.: Selection and Sorting with Limited Storage. Theor. Comput. Sci. 12, 315–323 (1980)
17. 17.
Spielman, D.A., Srivastava, N.: Graph sparsification by effective resistances. In: STOC 2008: Proceedings of the 40th annual ACM symposium on Theory of computing, pp. 563–568. ACM Press, New York (2008)Google Scholar