Skip to main content
Log in

When distributed computation is communication expensive

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

We consider a number of fundamental statistical and graph problems in the message-passing model, where we have \(k\) machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the \(k\) data sets. The communication is point-to-point, and the goal is to minimize the total communication among the \(k\) machines. This model captures all point-to-point distributed computational models with respect to minimizing communication costs. Our analysis shows that exact computation of many statistical and graph problems in this distributed setting requires a prohibitively large amount of communication, and often one cannot improve upon the communication of the simple protocol in which all machines send their data to a centralized server. Thus, in order to obtain protocols that are communication-efficient, one has to allow approximation, or investigate the distribution or layout of the data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. In the comparison we neglect the constants hidden in the big-\(O\) and big-\({\varOmega }\) notation which should be small.

  2. We can also choose, for example, \(P_1\) to be the coordinator and avoid the need for an additional site, though having an additional site makes the notation cleaner.

  3. We conjectured Theorem 5 in the conference version of this paper.

References

  1. Ahn, K.J., Guha, S., McGregor, A.: Analyzing graph structure via linear measurements. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 459–467. SIAM (2012)

  2. Ahn, K.J., Guha, S., McGregor, A.: Graph sketches: Sparsification, spanners, and subgraphs. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 5–14 (2012)

  3. Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Proceedings of International Colloquium on Automata, Languages, and Programming (2009)

  4. Balcan, M.-F., Blum, A., Fine, S., Mansour, Y.: Distributed learning, communication complexity and privacy. J. Mach. Learn. Res. Proc. Track, 23, 26.1–26.22 (2012)

  5. Beame, P., Koutris, P., Suciu, D.: Communication steps for parallel query processing. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 273–284 (2013)

  6. Braverman, M., Ellen, F., Oshman, R., Pitassi, T., Vaikuntanathan, V.: A tight bound for set disjointness in the message-passing model. In: FOCS, pp. 668–677 (2013)

  7. Brown, P., Haas, P.J., Myllymaki, J., Pirahesh, H., Reinwald, B., Sismanis, Y.: Toward automated large-scale information integration and discovery, pp. 161–180. In: Data Management in a Connected, World (2005)

  8. Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7(2), 21 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Daumé, III, H.D., Phillips, J.M., Saha, A., Venkatasubramanian, S.: Efficient protocols for distributed classification and optimization. In: Algorithmic Learning Theory, pp. 154–168 (2012)

  10. Daumé, III, H.D., Phillips, J.M., Saha, A., Venkatasubramanian, S.: Protocols for learning classifiers on distributed data. J. Mach. Learn. Res. Proc. Track 22, 282–290 (2012)

  11. Dor, D., Halperin, S., Zwick, U.: All-pairs almost shortest paths. SIAM J. Comput. 29(5), 1740–1759 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  12. Erdös, P., Rényi, A.: On the evolution of random graphs. In: Publication of the mathematical institute of the hungarian academy of sciences, pp. 17–61 (1960)

  13. Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Trans. Netw. 14(5), 925–937 (Oct. 2006)

  14. Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: Hyperloglog: The analysis of a near-optimal cardinality estimation algorithm. DMTCS Proceedings (1) (2008)

  15. Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci. 31(2), 182–209 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  16. Goldreich, O., Goldwasser, S., Ron, D.: Property testing and its connection to learning and approximation. J. ACM 45(4), 653–750 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  17. Goodrich, M.T., Sitchinava, N., Zhang, Q.: Sorting, searching, and simulation in the mapreduce framework. In: Proceedings of International Symposium on Algorithms and Computation, pp. 374–383 (2011)

  18. Huang, Z., Radunović, B., Vojnović, M., Zhang, Q.: The communication complexity of approximate maximum matching in distributed data. Manuscript (2013). http://research.microsoft.com/apps/pubs/default.aspx?id=188946

  19. Huang, Z., Yi, K., Zhang, Q.: Randomized algorithms for tracking distributed count, frequencies, and ranks. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 295–306 (2012)

  20. Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 41–52 (2010)

  21. Karloff, H.J., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms, pp. 938–948 (2010)

  22. Koutris, P., Suciu, D.: Parallel evaluation of conjunctive queries. In: Proceedings of ACM Symposium on Principles of Database Systems, pp. 223–234 (2011)

  23. Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  24. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    Book  MATH  Google Scholar 

  25. Palmer, C.R., Gibbons, P.B., Faloutsos, C.: Anf: a fast and scalable tool for data mining in massive graphs. In Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 81–90 (2002)

  26. Phillips, J.M., Verbin, E., Zhang, Q.: Lower bounds for number-in-hand multiparty communication complexity, made easy. In: Proceedings of ACM-SIAM Symposium on Discrete Algorithms (2012)

  27. Razborov, A.A.: On the distributional complexity of disjointness. In: Proceedings of International Colloquium on Automata, Languages, and Programming (1990)

  28. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  29. Woodruff, D.P., Zhang, Q.: An optimal lower bound for distinct elements in the message passing model. In SODA, pp. 718–733 (2014)

  30. Woodruff, D.P., Zhang, Q.: Tight bounds for distributed functional monitoring. In: Proceedings of ACM Symposium on Theory of Computing (2012)

  31. Yao, A.C.: Probabilistic computations: Towards a unified measure of complexity. In: Proceedings of IEEE Symposium on Foundations of Computer Science (1977)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qin Zhang.

Additional information

A preliminary version of this article appeared in Proceedings of the 27th International Symposium on Distributed Computing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Woodruff, D.P., Zhang, Q. When distributed computation is communication expensive. Distrib. Comput. 30, 309–323 (2017). https://doi.org/10.1007/s00446-014-0218-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-014-0218-3

Keywords

Navigation