# Graph Sparsification in the Semi-streaming Model

## Abstract

Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems has been limited. A main reason for this has been the existence of linear space lower bounds for even simple problems such as determining the connectedness of a graph. However, in many new scenarios that arise from social and other interaction networks, the number of vertices is significantly less than the number of edges. This has led to the formulation of the semi-streaming model where we assume that the space is (near) linear in the number of vertices (but not necessarily the edges), and the edges appear in an arbitrary (and possibly adversarial) order.

However there has been limited progress in analysing graph algorithms in this model. In this paper we focus on graph sparsification, which is one of the major building blocks in a variety of graph algorithms. Further, there has been a long history of (non-streaming) sampling algorithms that provide sparse graph approximations and it a natural question to ask: since the end result of the sparse approximation is a small (linear) space structure, can we achieve that using a small space, and in addition using a single pass over the data? The question is interesting from the standpoint of both theory and practice and we answer the question in the affirmative, by providing a one pass \(\tilde{O}(n/\epsilon^{2})\) space algorithm that produces a sparsification that approximates each cut to a (1 + *ε*) factor. We also show that \(\Omega(n \log \frac1\epsilon)\) space is necessary for a one pass streaming algorithm to approximate the min-cut, improving upon the *Ω*(*n*) lower bound that arises from lower bounds for testing connectivity.

## Keywords

Strong Connected Component Strong Connectivity Major Building Block Streaming Algorithm Streaming Model## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Alon, N., Matias, Y., Szegedy, M.: The Space Complexity of Approximating the Frequency Moments. J. Comput. Syst. Sci. 58(1), 137–147 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
- 2.Benczúr, A.A., Karger, D.R.: Approximating s-t minimum cuts in O(n2) time. In: STOC 1996: Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pp. 47–55. ACM, New York (1996)CrossRefGoogle Scholar
- 3.Chekuri, C.S., Goldberg, A.V., Karger, D.R., Levine, M.S., Stein, C.: Experimental study of minimum cut algorithms. In: SODA 1997: Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 324–333. Society for Industrial and Applied Mathematics (1997)Google Scholar
- 4.Chernoff, H.: A measure of the asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics 23, 493–509 (1952)MathSciNetCrossRefzbMATHGoogle Scholar
- 5.Demetrescu, C., Finocchi, I., Ribichini, A.: Trading off space for passes in graph streaming problems. In: SODA, pp. 714–723 (2006)Google Scholar
- 6.Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: On graph problems in a semi-streaming model. Theor. Comput. Sci. 348(2), 207–216 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
- 7.Gomory, R.E., Hu, T.C.: Multi-terminal network flows. J. Soc. Indust. Appl. Math. 9(4), 551–570 (1961)MathSciNetCrossRefzbMATHGoogle Scholar
- 8.Hao, J., Orlin, J.B.: A faster algorithm for finding the minimum cut in a graph. In: SODA 1992: Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 165–174. Society for Industrial and Applied Mathematics (1992)Google Scholar
- 9.Henzinger, M., Raghavan, P., Rajagopalan, S.: Computing on data streams (1998)Google Scholar
- 10.Karger, D.R.: Global min-cuts in rnc, and other ramifications of a simple min-out algorithm. In: SODA 1993: Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms, Philadelphia, PA, USA, pp. 21–30. Society for Industrial and Applied Mathematics (1993)Google Scholar
- 11.Karger, D.R.: Random sampling in cut, flow, and network design problems. In: STOC 1994: Proceedings of the twenty-sixth annual ACM symposium on Theory of computing, pp. 648–657. ACM Press, New York (1994)CrossRefGoogle Scholar
- 12.Karger, D.R.: Minimum cuts in near-linear time. J. ACM 47(1), 46–76 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
- 13.Karger, D.R., Stein, C.: A new approach to the minimum cut problem. J. ACM 43(4), 601–640 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
- 14.McGregor, A.: Finding Graph Matchings in Data Streams. In: Proc. of APPROX-RANDOM, pp. 170–181 (2005)Google Scholar
- 15.Muthukrishnan, S.: Data streams: Algorithms and Applications. Now publishers (2006)Google Scholar
- 16.Ian Munro, J., Paterson, M.: Selection and Sorting with Limited Storage. Theor. Comput. Sci. 12, 315–323 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
- 17.Spielman, D.A., Srivastava, N.: Graph sparsification by effective resistances. In: STOC 2008: Proceedings of the 40th annual ACM symposium on Theory of computing, pp. 563–568. ACM Press, New York (2008)Google Scholar