Advertisement

When Distributed Computation Is Communication Expensive

  • David P. Woodruff
  • Qin Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8205)

Abstract

We consider a number of fundamental statistical and graph problems in the message-passing model, where we have k machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the k data sets. The communication is point-to-point, and the goal is to minimize the total communication among the k machines. This model captures all point-to-point distributed computational models with respect to minimizing communication costs. Our analysis shows that exact computation of many statistical and graph problems in this distributed setting requires a prohibitively large amount of communication, and often one cannot improve upon the communication of the simple protocol in which all machines send their data to a centralized server. Thus, in order to obtain protocols that are communication-efficient, one has to allow approximation, or investigate the distribution or layout of the data sets.

Keywords

Error Probability Communication Cost Distinct Element Communication Complexity Conjunctive Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ahn, K.J., Guha, S., McGregor, A.: Analyzing graph structure via linear measurements. In: Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 459–467. SIAM (2012)Google Scholar
  2. 2.
    Ahn, K.J., Guha, S., McGregor, A.: Graph sketches: sparsification, spanners, and subgraphs. In: Proc. ACM Symposium on Principles of Database Systems, pp. 5–14 (2012)Google Scholar
  3. 3.
    Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part I. LNCS, vol. 5555, pp. 95–106. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  4. 4.
    Balcan, M.-F., Blum, A., Fine, S., Mansour, Y.: Distributed learning, communication complexity and privacy. Journal of Machine Learning Research - Proceedings Track 23, 26.1–26.22 (2012)Google Scholar
  5. 5.
    Beame, P., Koutris, P., Suciu, D.: Communication steps for parallel query processing. In: PODS, pp. 273–284 (2013)Google Scholar
  6. 6.
    Brown, P., Haas, P.J., Myllymaki, J., Pirahesh, H., Reinwald, B., Sismanis, Y.: Toward automated large-scale information integration and discovery. In: Härder, T., Lehner, W. (eds.) Data Management (Wedekind Festschrift). LNCS, vol. 3551, pp. 161–180. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Transactions on Algorithms 7(2), 21 (2011)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Erdos, P., Renyi, A.: On the evolution of random graphs, pp. 17–61. Publication of the Mathematical Institute of the Hungarian Academy of Sciences (1960)Google Scholar
  9. 9.
    Estan, C., Varghese, G., Fisk, M.: Bitmap algorithms for counting active flows on high-speed links. IEEE/ACM Trans. Netw. 14(5), 925–937 (2006)CrossRefGoogle Scholar
  10. 10.
    Flajolet, P., Fusy, É., Gandouet, O., Meunier, F.: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. DMTCS Proceedings (1) (2008)Google Scholar
  11. 11.
    Flajolet, P., Martin, G.N.: Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences 31(2), 182–209 (1985)MathSciNetCrossRefMATHGoogle Scholar
  12. 12.
    Goldreich, O., Goldwasser, S., Ron, D.: Property testing and its connection to learning and approximation. Journal of the ACM 45(4), 653–750 (1998)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Goodrich, M.T., Sitchinava, N., Zhang, Q.: Sorting, searching, and simulation in the mapReduce framework. In: Asano, T., Nakano, S.-i., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 374–383. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Huang, Z., Radunović, B., Vojnović, M., Zhang, Q.: The communication complexity of approximate maximum matching in distributed data (manuscript 2013), http://research.microsoft.com/apps/pubs/default.aspx?id=188946
  15. 15.
    Huang, Z., Yi, K., Zhang, Q.: Randomized algorithms for tracking distributed count, frequencies, and ranks. In: Proc. ACM Symposium on Principles of Database Systems, pp. 295–306 (2012)Google Scholar
  16. 16.
    Daumé III, H., Phillips, J.M., Saha, A., Venkatasubramanian, S.: Efficient protocols for distributed classification and optimization. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 154–168. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Daumé III, H., Phillips, J.M., Saha, A., Venkatasubramanian, S.: Protocols for learning classifiers on distributed data. Journal of Machine Learning Research - Proceedings Track 22, 282–290 (2012)Google Scholar
  18. 18.
    Kane, D.M., Nelson, J., Woodruff, D.P.: An optimal algorithm for the distinct elements problem. In: Proc. ACM Symposium on Principles of Database Systems, pp. 41–52 (2010)Google Scholar
  19. 19.
    Karloff, H.J., Suri, S., Vassilvitskii, S.: A model of computation for mapreduce. In: Proc. ACM-SIAM Symposium on Discrete Algorithms, pp. 938–948 (2010)Google Scholar
  20. 20.
    Koutris, P., Suciu, D.: Parallel evaluation of conjunctive queries. In: Proc. ACM Symposium on Principles of Database Systems, pp. 223–234 (2011)Google Scholar
  21. 21.
    Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press (1997)Google Scholar
  22. 22.
    Palmer, C.R., Gibbons, P.B., Faloutsos, C.: Anf: a fast and scalable tool for data mining in massive graphs. In: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 81–90 (2002)Google Scholar
  23. 23.
    Phillips, J.M., Verbin, E., Zhang, Q.: Lower bounds for number-in-hand multiparty communication complexity, made easy. In: Proc. ACM-SIAM Symposium on Discrete Algorithms (2012)Google Scholar
  24. 24.
    Razborov, A.A.: On the distributional complexity of disjointness. In: Paterson, M.S. (ed.) Proc. International Colloquium on Automata, Languages, and Programming. LNCS, vol. 443, pp. 249–253. Springer, Heidelberg (1990)CrossRefGoogle Scholar
  25. 25.
    Valiant, L.G.: A bridging model for parallel computation. Communications of the ACM 33(8), 103–111 (1990)CrossRefGoogle Scholar
  26. 26.
    Woodruff, D.P., Zhang, Q.: Tight bounds for distributed functional monitoring. In: Proc. ACM Symposium on Theory of Computing (2012)Google Scholar
  27. 27.
    Woodruff, D.P., Zhang, Q.: When distributed computation is communication expensive. CoRR, abs/1304.4636 (2013)Google Scholar
  28. 28.
    Yao, A.C.: Probabilistic computations: Towards a unified measure of complexity. In: Proc. IEEE Symposium on Foundations of Computer Science (1977)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • David P. Woodruff
    • 1
  • Qin Zhang
    • 2
  1. 1.IBM Research AlmadenUSA
  2. 2.Indiana University BloomingtonUSA

Personalised recommendations