Skip to main content
Log in

Streaming Graph Computations with a Helpful Advisor

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

Motivated by the trend to outsource work to commercial cloud computing services, we consider a variation of the streaming paradigm where a streaming algorithm can be assisted by a powerful helper that can provide annotations to the data stream. We extend previous work on such annotation models by considering a number of graph streaming problems. Without annotations, streaming algorithms for graph problems generally require significant memory; we show that for many standard problems, including all graph problems that can be expressed with totally unimodular integer programming formulations, only constant space (measured in words) is needed for single-pass algorithms given linear-sized annotations. We also obtain protocols achieving essentially optimal tradeoffs between annotation length and memory usage for several important problems, including integer matrix-vector multiplication, as well as shortest s-t path in small-diameter graphs. We also obtain non-trivial tradeoffs for minimum weight bipartite perfect matching and shortest s-t path in general graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aggarwal, G., Datar, M., Rajagopalan, S., Ruhl, M.: On the streaming model augmented with a sorting primitive. In: FOCS, pp. 540–549 (2004)

    Google Scholar 

  2. Arora, S., Barak, B.: Computational Complexity—A Modern Approach. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  3. Bertsekas, D.: Convex Optimization Theory. Athena Scientific, Nashua (2009)

    MATH  Google Scholar 

  4. Blum, M., Evans, W., Gemmell, P., Kannan, S., Naor, M.: Checking the correctness of memories. Algorithmica 90–99 (1995)

  5. Candès, E., Romberg, J., Tao, T.: Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math. 59(8), 1207–1223 (2006)

    Article  MATH  Google Scholar 

  6. Candès, E.J., Randall, P.A.: Highly robust error correction by convex programming. IEEE Trans. Inf. Theory 54(7), 2829–2840 (2008)

    Article  Google Scholar 

  7. Chakrabarti, A., Cormode, G., McGregor, A.: Annotations in data streams. In: ICALP, pp. 222–234 (2009)

    Google Scholar 

  8. Clarkson, K.L., Woodruff, D.P.: Numerical linear algebra in the streaming model. In: STOC, pp. 205–214 (2009)

    Google Scholar 

  9. Das Sarma, A., Lipton, R.J., Nanongkai, D.: Best-order streaming model. In: Theory and Applications of Models of Computation, pp. 178–191 (2009)

    Chapter  Google Scholar 

  10. Demetrescu, C., Finocchi, I., Ribichini, A.: Trading off space for passes in graph streaming problems. ACM Trans. Algorithms 6(1) (2009)

  11. Feigenbaum, J., Kannan, S., McGregor, A., Suri, S., Zhang, J.: On graph problems in a semi-streaming model. Theor. Comput. Sci. 348(2), 207–216 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  12. Hochbaum, D.S.: Personal communication (2011)

  13. Hochbaum, D.S., Shanthikumar, J.G.: Convex separable optimization is not much harder than linear optimization. J. ACM 37, 843–862 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  14. King, V.: A simpler minimum spanning tree verification algorithm. Algorithmica 18(2), 263–270 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kleinberg, J.M., Tardos, É.: Algorithm Design. Addison-Wesley, Reading (2006)

    Google Scholar 

  16. Lipton, R.J.: Fingerprinting sets. Technical Report Cs-tr-212-89, Princeton University (1989)

  17. Lipton, R.J.: Efficient checking of computations. In: STACS, pp. 207–215 (1990)

    Google Scholar 

  18. McGregor, A.: Graph Mining on Streams. In: Encycl. of Database Systems. Springer, Berlin (2009)

    Google Scholar 

  19. Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Upper Saddle River (1982)

    MATH  Google Scholar 

  20. Sarlos, T.: Improved approximation algorithms for large matrices via random projections. In: IEEE FOCS, 2006

    Google Scholar 

  21. Schrijver, A.: Theory of Linear and Integer Programming. Wiley, New York (1998)

    MATH  Google Scholar 

  22. Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency (Algorithms and Combinatorics). Springer, Berlin (2004)

    MATH  Google Scholar 

  23. Seidel, R.: On the all-pairs-shortest-path problem in unweighted undirected graphs. J. Comput. Syst. Sci. 51, 400–403 (1995)

    Article  MathSciNet  Google Scholar 

  24. Tardos, E.: A strongly polynomial algorithm to solve combinatorial linear programs. Oper. Res. 34, 250–256 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  25. Yiu, M., Lin, Y., Mouratidis, K.: Efficient verification of shortest path search via authenticated hints. In: ICDE, 2010

    Google Scholar 

  26. Yuster, R.: Computing the diameter polynomially faster than APSP. In: CoRR (2010). arXiv:1011.6181v2

    Google Scholar 

Download references

Acknowledgements

We thank Moni Naor for suggesting the use of memory checking as a general approach to generating protocols. We also thank Dorit Hochbaum, Dimitri Bertsekas, Thomas Steinke, and Varun Kanade for helpful discussions, and the anonymous reviewers for providing helpful suggestions for improving the clarity and completeness.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Justin Thaler.

Additional information

Work of M. Mitzenmacher was supported in part by NSF grants CCF-0915922 and CNS-0721491, and in part by grants from Yahoo! Research, Google, and Cisco, Inc.

J. Thaler supported by the Department of Defense (DoD) through the National Defense Science & Engineering Graduate Fellowship (NDSEG) Program, and in part by NSF grants CNS-0721491 and CCF-0915922.

Appendices

Appendix A: Linear Programs with Rational Entries

One might hope that Theorem 3 extends naturally to linear programs with rational entries. The following simple variant of Example 1 demonstrates that with rational data it is no longer sufficient to assume all subdeterminants of the constraint matrix are bounded in absolute value.

Example 2

Consider the linear program of Example 1, with A replaced by the matrix

Observe that all subdeterminants of this matrix are bounded (above) in absolute value by 1. Exactly as in Example 1, the unique feasible point is the vector x∈ℤc with i’th coordinate equal to 2i, and the value of the linear program is Ω(2c). \(\mathcal {V}\) cannot manipulate quantities of such magnitude exactly with less than linear space.

However, context often provides an a priori bound on the value of the program and the length of the optima, especially in applications to combinatorial optimization. Still, without sufficiently strong assumptions we cannot guarantee that there is an optimal solution that can be exactly specified in small precision, and \(\mathcal {H}\) must therefore send an approximate representation of the optimum. This rounded optimum is not guaranteed to be feasible for the exact LP, but will be essentially optimal for a very slightly perturbed LP.

Theorem 13

Suppose all entries of A, b, and c are rational numbers p/q for p,q∈ℤ. Assume the value of the linear program is finite, and there is a known polynomial upper bound on the length of an optimum. For any 0<ϵ 1<1/|A|, there is an \(\epsilon_{2} = g(\mathcal{A}) \epsilon_{1}\), with independent of ϵ 1, such that the following is true. There is a valid \((|A| \frac{\log 1/\epsilon_{1}}{\log |A|}, \frac{\log 1/\epsilon_{1}}{\log |A|})\) protocol for obtaining an additive-ϵ 2 approximation to the value of the perturbed primal LP max{c T xA xb+ϵ 1 1}. In particular, if , then we obtain an (|A|,1) protocol.

Proof

First, suppose all entries of A, b, and c are integer multiples of ϵ for to be determined; we remove this assumption later.

1.1 A.1 Protocol Specification

Notice that all entries of A, b and c are elements of a universe of size , and \(\mathcal {V}\) can fingerprint all multisets as in Theorem 3 using a finite field \(\mathbb{F}_{p}\) where . These fingerprints require \(O(\frac{\log 1/\epsilon_{1}}{\log |A|})\) words of space.

Since the value of the LP is finite, there exist primal and dual optimal solutions x and y , and by assumption all entries of x and y have absolute value polynomial in b and c. Let \(\hat{\mathbf {x}}^{*}\) and \(\hat{\mathbf {y}}^{*}\) denote the vectors x and y , with all entries rounded up to integer multiples of ϵ where \(\epsilon=\frac{\epsilon_{1}}{2u}\). Here, u∈ℤ is a strict upper bound on ∥A, ∥c1, ∥b1, ∥x1, and ∥y1, where throughout, if A is a matrix then ∥A p denotes the operator norm of A induced by the p norm on vectors, and if x is a vector then ∥x p denotes the p-norm on vectors. Notice u is polynomial in b and c, does not depend on ϵ 1, and can be determined by V while observing the stream.

The protocol is exactly as in Theorem 3, except \(\mathcal {H}\) sends vectors \(\hat{\mathbf {x}}\) and \(\hat{\mathbf {y}}\) claimed to be \(\hat{\mathbf {x}}^{*}\) and \(\hat{\mathbf {y}}^{*}\). \(\mathcal {V}\) checks that

  1. 1.

    ϵ 1/ϵ≥2u. That is, \(\hat{\mathbf{x}}\) and \(\hat{\mathbf{y}}\) are represented at sufficiently high precision.

  2. 2.

    \(\hat{\mathbf{x}}\) is feasible for the perturbed primal.

  3. 3.

    \(|\mathbf{b}^{T} \hat{\mathbf{y}} - \mathbf{c}^{T} \hat{\mathbf{x}}| \leq \epsilon_{1}\).

  4. 4.

    \(\hat{\mathbf{y}}\) is feasible for the LP

    This LP can be thought of as the perturbed dual, although it is not the dual of the perturbed primal.

If all checks pass, \(\mathcal {V}\) outputs \(\mathbf{c}^{T} \hat{\mathbf{x}}\).

1.2 A.2 Protocol Validity

Write \(\mathbf{e}(\mathbf{x}^{*})= \hat{\mathbf{x}}^{*} - \mathbf{x}^{*}\) and \(\mathbf{e}(\mathbf{y})= \hat{\mathbf{y}}^{*} - \mathbf{y}^{*}\). Notice that ∥e(x )∥ϵ and ∥e(y )∥ϵ for all i. Since x and y are primal and dual feasible, it is easy to see \(\hat{\mathbf{x}}^{*}\) and \(\hat{\mathbf{y}}^{*}\) are feasible for the perturbed LPs. Indeed,

$$\|A\mathbf {x}^*-A\hat{\mathbf {x}}^*\|_{\infty} = \|A\mathbf{e}(\mathbf{x}^*)\|_{\infty} \leq \|A\|_{\infty}\|\mathbf{e}(\mathbf{x}^*)\|_{\infty} \leq u \epsilon \leq \epsilon_1 ,$$

and a similar argument shows \(\hat{\mathbf {y}}^{*}\) is feasible for the perturbed dual.

Since c T x=b T y, it follows that

$$|\mathbf{b}^T \hat{\mathbf{y}}^* - \mathbf{c}^T\hat{\mathbf{x}}^*|\leq |\mathbf{c}^T\mathbf{e}(\mathbf {x})| + |\mathbf{b}^T \mathbf{e}(\mathbf {y})| \leq 2\epsilon u \leq \epsilon_1 .$$

Therefore, if \(\hat{\mathbf {x}}\) and \(\hat{\mathbf {y}}\) are as claimed, all of \(\mathcal {V}\)’s checks will pass.

To prove the protocol is valid, it therefore suffices to show that if vectors x and y pass \(\mathcal {V}\)’s checks, then c T x is within an additive ϵ 2=(u+2)ϵ 1 of the true value of the perturbed primal LP. c T x is clearly a lower bound on the value of the perturbed primal, since x is feasible for the perturbed primal. We claim b T y+(u+1)ϵ 1 is an upper bound on the value of the perturbed primal. Indeed, since y is feasible for the perturbed dual, ∥A T ycϵ for all i, and thus for all feasible x of the perturbed primal,

where the third inequality holds because x is feasible for the perturbed primal and y≥0. Therefore, c T x is a lower bound on the value of the perturbed primal and b T y+(u+1)ϵ 1c T x+(u+2)ϵ 1 is an upper bound, which completes the proof of validity.

If the entries of A, b, and c are not integer multiples of ϵ, but are instead arbitrary rationals p/q for p,q∈ℤ, we run the above protocol on the derived stream in which each stream element is rounded to the nearest integer multiple of ϵ. (If \(\mathcal {V}\) cannot determine the necessary precision ϵ in advance, we can afford for \(\mathcal {V}\) to calculate it while observing the stream, and then have \(\mathcal {H}\) replay the entire stream.) This introduces error into all of the above calculations, but we can obtain the same approximation guarantees by setting ϵ a polynomial factor smaller than above. This does not affect the asymptotic costs of the protocol.  □

Appendix B: Proof of Theorem 5

Proof

The dual function of any quadratic programming problem is given by

and q(μ)=−∞ otherwise. Strong duality always holds for quadratic programs. That is, for all μ, q(μ) is a lower bound on the value f of the primal quadratic program, and moreover there exists a μ 0 such that q(μ )=f [3, Example 5.3.1]. Consequently, we can use a protocol akin to that of Theorem 13 and Corollary 4: essentially \(\mathcal {H}\) proves optimality of a primal solution x by providing a dual-optimal solution μ, and proving to \(\mathcal {V}\) that the values of x and μ are equal. But since \(\mathcal {H}\) cannot afford to send an exact representation of x or μ, \(\mathcal {H}\) instead sends vectors \(\hat{\mathbf{x}}\) and \(\hat{\mu}\) with all entries integer multiples of \(\epsilon=\frac{\epsilon_{1}}{h(\mathcal{A})}\). Here h is some function polynomial in b and c and independent of ϵ 1, to be specified later. These vectors are claimed to be rounded versions of the true optima.

Let f(x) denote the primal objective function evaluated at x. \(\mathcal {V}\) checks that

  1. 1.

    \(\epsilon_{1}/\epsilon \geq h(\mathcal{A})\). That is, x and μ are represented at sufficiently high precision.

  2. 2.

    \(A\hat{\mathbf{x}} \leq \mathbf{b} + \epsilon_{1} \mathbf{1}\).

  3. 3.

    μ0.

  4. 4.

    .

    If all checks pass, \(\mathcal {V}\) outputs \(\mathbf{c}^{T} \hat{\mathbf{x}}\), otherwise \(\mathcal {V}\) outputs ⊥.

The first and third checks are trivial to perform, and \(\mathcal {V}\) can perform the second check with a single matrix-vector multiplication operation. To perform the fourth check, \(\mathcal {V}\) first computes \(f(\hat{\mathbf{x}})\) with a single matrix-vector multiplication and two inner product computations. \(\mathcal {V}\) then computes q(μ) by verifying a constant number of matrix-vector multiplications and inner product computations as follows. First \(\mathcal {V}\) computes in a sequence of four matrix-vector multiplications: (1) z 11:=A T μ; (2) z 12:=Q −1 z 11 (computed by having \(\mathcal {H}\) specify z 12 and verifying that Q z 12=z 11); (3) z 13:=A z 12; and (4) . We stress that \(\mathcal {V}\) need not explicitly invert Q or be provided with Q −1 to compute z 12:=Q −1 z 11; instead, it suffices to verify a single matrix-vector multiplication. Next, \(\mathcal {V}\) computes z 2:=μ T b with a single inner product computation, and computes z 3:=μ T AQ −1 c and \(\mathbf{z}_{4} := \frac{1}{2}\mathbf{c}^{T}Q^{-1}\mathbf{c}\) similarly to z 1. Given z 1,z 2,z 3, and z 4, \(\mathcal {V}\) may compute the value q(μ)=z 1z 2z 3z 4.

It remains to argue that we can set \(\epsilon =\epsilon_{1}/h(\mathcal{A})\) such that: if \(\hat{\mathbf{x}}\) and are as claimed, then all checks will pass with probability 1, and if all checks pass then with high probability the value of the perturbed QP is \(\mathbf{c}^{T} \hat{\mathbf{x}} \pm \epsilon\).

Suppose \(\hat{\mathbf{x}}\) and μ are as claimed. Then first and third checks will pass, and the second check will pass by exactly the same argument as in Theorem 13. For the final check, let ; ∥eϵ. Then

By repeated application of the triangle inequality to each term in the above sum, it can be seen that the above expression is at most , where u is a known upper bound on ∥A, ∥Q, ∥Q −1, ∥b1, ∥c1, ∥x1 and ∥μ1. Notice that ∥Q −1 is polynomially bounded as long as Q is sufficiently positive definite, in the sense that all eigenvalues of Q are at least . All other quantities for which u is an upper bound are polynomial in b and c by assumption, and the above expression is therefore at most for some known polynomial in b and c that is independent of ϵ 1.

Similarly, for some polynomial in b and c that is independent of ϵ 1. Thus, is also for some polynomial in b and c that only depends on u and is independent of ϵ 1. We can therefore choose such that the final check will pass if \(\hat{\mathbf{x}}\) and are as claimed.

Finally, we argue that if all four checks pass, then f(x) is a \(g(\mathcal{A})\epsilon_{1}\)-approximation to the value of the perturbed primal for some independent of ϵ 1. The argument in this case is simpler than that in Theorem 13, because if the third check passes, then μ is actually feasible for the dual of the perturbed primal, as the only constraint of the dual is that μ0. Indeed, let q P denote the dual of the perturbed primal; it is easy to see that q P (μ)=q(μ)−μ T(ϵ 1 1). By weak duality, for any x that is feasible for the perturbed primal, and any μ, f(x)≥q P (μ)≥q(μ)− 1. Since the fourth check passed, f(x)−q(μ)≤ϵ 1, and therefore f(x)−q P (μ)≤(u+1)ϵ 1. We can let \(g(\mathcal{A})=u+1\).

By invoking the matrix-vector multiplication protocols of Theorem 3 and Theorem 4 (handling the fact that the vectors are not integral as in Corollary 4) to compute all matrix-vector multiplications in the above description, we obtain the theorem. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cormode, G., Mitzenmacher, M. & Thaler, J. Streaming Graph Computations with a Helpful Advisor. Algorithmica 65, 409–442 (2013). https://doi.org/10.1007/s00453-011-9598-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-011-9598-y

Keywords

Navigation