Communication complexity of PRAMs
We propose a model for the concurrent read exclusive write PRAM that captures its communication and computational requirements. For this model, we present several results, including the following:
Two n×n matrices can be multiplied in O(n3/p) computation time and O(n2/p2/3) communication delay using p processors (for p≤n3 / log3/2n). Furthermore, these bounds are optimal for arithmetic on semirings (using +, × only). For sorting and for FFT graphs, it is shown that communication delay of Ω(n log n/(p log(n/p)) is required for p≤n/ log n. This bound is tight for FFT graphs; it is also shown to be tight for sorting provided p≤n1−ε for any fixed ε>0.
Given a binary tree, τ, with n leaves and height h, let D opt (τ) denote the minimum communication delay needed to compute τ. It is shown that Ω(log n)≤D opt (τ)≤\(O(\sqrt n )\), and \(\Omega (\sqrt h )\)≤D opt ≤O(h), all bounds being the best possible. We also present a simple polynomial algorithm that generates a schedule for computing τ with at most 2D opt (τ) delay.
It is shown that the a communication delay-computation time tradeoff given by Papadimitriou and Ullman for a diamond dag can be achieved for essentially two values of the computation time. We also present DAGs that exhibit proper tradeoffs for a substantial range of time.
KeywordsComputation Time Binary Tree Directed Acyclic Graph Communication Complexity Global Memory
Unable to display preview. Download preview PDF.
- [A80]H. Abelson, "Lower Bounds on Information Transfer in Distributed Systems," J. of ACM, Vol. 27, pp. 384–392, 1980.Google Scholar
- [AHU74]A. V. Aho, J. E. Hopcroft and J. The D. Ullman, "Design and Analysis of Computer Algorithms," Addison Wesley, 1974.Google Scholar
- [AUY83]A. V. Aho, J. D. Ullman, and M. Yannakakis, "On Notions of Information Transfer in VLSI Circuits," Proc. 15th Annual ACM Symp. on Theory of Computing, pp. 133–139, 1983.Google Scholar
- [Co86]R. Cole, "Parallel Merge Sort," Proc. 27th Annual IEEE Conf. on Foundations of Computer Science, pp. 511–516, 1986.Google Scholar
- [DGS83]P. Duris, Z.Galil, and G. Schnitger, "Lower Bounds on Communication Complexity," Proc. 16th Annual ACM Symp. on Theory of Computing, pp. 133–139, 1983.Google Scholar
- [HK81]J. W. Hong and H. T. Kung, "I/O Complexity: the Red-Blue Pebble Game," Proc. of 13th Annual ACM Symp. on Theory of Computing, pp. 326–333, May 1981.Google Scholar
- [JK84]J. Ja'Ja' and P. Kumar, "Information Transfer in Distributed Computing with Applications to VLSI," J. of ACM, Vol. 31, pp. 150–162, 1984.Google Scholar
- [K70]L. R. Kerr, "The Effect of Algebraic Structure on the Computational Complexity of Matrix Multiplications," Ph.D. Thesis, Cornell University, 1970.Google Scholar
- [L85]F. T. Leighton, "Tight Bounds on the Complexity of Parallel Sorting," IEEE Trans. on Computers, Vol. C-34, No. 3, April 1985.Google Scholar
- [PS81]C. H. Papadimitriou and M. Sipser, "Communication Complexity," J. of ACM, Vol. 28, pp. 260–268, 1981.Google Scholar
- [PU87]C. H. Papadimitriou and J. D. Ullman, "A Communication-Time Tradeoff," SIAM J. of Computing, Vol. 16, pp. 639–647, Aug. 1987.Google Scholar
- [PY88]C. H. Papadimitriou and M. Yannakakis, "Towards an Architecture-Independent Analysis of Parallel Algorithms" 20th Ann. ACM Symp. on Theory of Computing, 1988.Google Scholar
- [S88]M. Snir, personal communication.Google Scholar
- [T79]C. D. Thompson, "Area-Time Complexity for VLSI," Proc. 11th Annual ACM Symp. on Theory of Computing, pp. 81–88, 1979.Google Scholar
- [T84]P. Tiwari, "Lower Bounds on Communication Complexity in Distributed Computer Networks," Proc. 25th Annual IEEE Symp. on Foundations of Computer Science, pp. 109–117, 1984.Google Scholar
- [WF81]C. L. Wu and T. Y. Feng, "The Universality of the Shuffle-Exchange Network," IEEE Trans. on Computers, Vol. C-30, No. 5, May 1981, 324–332.Google Scholar
- [Y79]A. C.-C. Yao, "Some Complexity Questions Related to Distributive Computing," Proc. 11th Annual ACM Symp. on Theory of Computing, pp. 209–213, 1979.Google Scholar