Parallel Computing 1988 pp 1-22 | Cite as

# Parallel algorithms and architectures

## Abstract

In this paper we consider some of the central issues involved in the design of parallel algorithms. We describe several efficient algorithms for idealised shared memory architectures and draw some conclusions as to what would be required to implement them on a realistic physical architecture, i.e. one with distributed memory. We also describe some systolic algorithms for matrix computations, sequence comparison and molecular modelling, and briefly discuss their implementation on arrays of transputers. In the final section we discuss the question of whether the current preoccupation with architectural details in parallel algorithm design is likely to persist. We briefly describe some techniques which show that a physically realistic general purpose parallel architecture based on distributed memory can be constructed which will execute any shared memory parallel algorithm with no significant overhead due to communication. We thus have the attractive prospect in the very near future of architectural independence in parallel algorithm design.

## Keywords

algorithms complexity computer architecture molecular modelling parallel computation routing sequence comparison systolic algorithms## References

- [1]Aho, A. V., Hopcroft, J. E., and Ullman, J. D.
*The Design and Analysis of Computer Algorithms*. Addison-Wesley, 1974.Google Scholar - [2]
- [3]
- [4]Alt, H. Comparing the combinational complexities of arithmetic functions.
*Journal of the ACM 35, 2*(Apr. 1988), 447–460.Google Scholar - [5]Atallah, M. J., and Hambrusch, S. E. Solving tree problems on a mesh-connected processor array. In
*Proc. 26th Annual IEEE Symposium on Foundations of Computer Science*(1985), pp. 222–231.Google Scholar - [6]Atallah, M. J., and Kosaraju, S. R. Graph problems on a mesh-connected processor array. In
*Proc. 14th Annual ACM Symposium on Theory of Computing*(1982), pp. 345–353.Google Scholar - [7]Batcher, K. E. Sorting networks and their applications. In
*Proc. AFIPS Spring Joint Computer Conference*(1968), pp. 307–314.Google Scholar - [8]Berkowitz, S. J. On computing the determinant in small parallel time using a small number of processors.
*Information Processing Letters 18*(1984), 147–150.Google Scholar - [9]Bertsekas, D. P., and Tsitsiklis, J. N.
*Parallel and Distributed Computation — Numerical Methods*. Prentice Hall, 1989.Google Scholar - [10]
- [11]Blelloch, G. Scans as primitive parallel operations. In
*Proc. International Conference on Parallel Processing*(1987).Google Scholar - [12]Borodin, A., and Munro, I. J.
*The Computational Complexity of Algebraic and Numeric Problems*. Theory of Computation Series. American Elsevier, 1975.Google Scholar - [13]
- [14]Brent, R. P. The parallel evaluation of general arithmetic expressions.
*Journal of the ACM 21*(1974), 201–206.Google Scholar - [15]Cappello, P. R., and Steiglitz, K. Unifying VLSI array design with linear transformations of space-time.
*Advances in Computing Research 2*(1984), 23–65. Jai Press Inc.Google Scholar - [16]Chandy, K. M., and Misra, J.
*Parallel Program Design: A Foundation*. Addison-Wesley, 1988.Google Scholar - [17]Cook, S. A. An overview of computational complexity.
*Communications of the ACM 26*, 6 (1983), 400–408.Google Scholar - [18]Cook, S. A. A taxonomy of problems with fast parallel algorithms.
*Information and Control 64*, (1–3) (1985), 2–22.Google Scholar - [19]Cook, S. A., and Dwork, C. Bounds on the time for parallel RAM's to compute simple functions. In
*Proc. 14th Annual ACM Symposium on Theory of Computing*(1982), pp. 231–233.Google Scholar - [20]Coppersmith, D., and Winograd, S. Matrix multiplication via arithmetic progressions. In
*Proc. 19th Annual ACM Symposium on Theory of Computing*(1987), pp. 1–6.Google Scholar - [21]Csanky, L. Fast parallel matrix inversion algorithms.
*SIAM Journal on Computing 5*(1976), 618–623.Google Scholar - [22]Date, C. J.
*An Introduction to Database Systems*, fourth ed., vol. 1 of*Systems Programming Series*. Addison-Wesley, 1986.Google Scholar - [23]Dunne, P. E.
*The Complexity of Boolean Networks*, vol. 29 of*A.P.I.C. Studies in Data Processing*. Academic Press, 1988.Google Scholar - [24]Fiat, A., and Shamir, A. Polymorphic arrays: A novel VLSI layout for systolic computers. In
*Proc. 25th Annual IEEE Symposium on Foundations of Computer Science*(1984), pp. 37–45.Google Scholar - [25]
- [26]Fox, G. C., Johnson, M. A., Lyzenga, G. A., Otto, S. W., Salmon, J. K., and Walker, D. W.
*Solving Problems on Concurrent Processors: Volume 1. General Techniques and Regular Problems*. Prentice Hall, 1988.Google Scholar - [27]Gallager, R. G., Humblet, P. A., and Spira, P. M. A distributed algorithm for minimum-weight spanning trees.
*ACM Transactions on Programming Languages and Systems 5*(1983), 66–77.Google Scholar - [28]Gathen, von zur, J. Parallel arithmetic computations: A survey. In
*Proc. Mathematical Foundations of Computer Science1986, LNCS Vol. 233*(1986), Springer-Verlag, pp. 93–112.Google Scholar - [29]Gibbons, A. M., and Rytter, W.
*Efficient Parallel Algorithms*. Cambridge University Press, 1988.Google Scholar - [30]Greenberg, A. C., Ladner, R. E., Paterson, M. S., and Galil, Z. Efficient parallel algorithms for linear recurrence computation.
*Information Processing Letters 15*, 1 (Aug. 1982), 31–35.Google Scholar - [31]Guibas, L. J., Kung, H. T., and Thompson, C. D. Direct VLSI implementation of combinatorial algorithms. In
*Proc. Caltech Conference on VLSI*(1979), C. Seitz, Ed., pp. 509–525.Google Scholar - [32]Gupta, A. K., and Hambrusch, S. E. Optimal three-dimensional layouts of complete binary trees.
*Information Processing Letters 26*(1987), 99–104.Google Scholar - [33]
- [34]Hehre, W. J., Radom, L., v.R Schleyer, P., and Pople, J. A.
*Ab Initio Molecular Orbital Theory*. John Wiley and Sons, 1986.Google Scholar - [35]Hennessy, M. Proving systolic systems correct.
*ACM Transactions on Programming Languages and Systems 8*, 3 (1986), 344–387.Google Scholar - [36]
- [37]
- [38]
- [39]Jerrum, M. R., and Skyum, S. Families of fixed degree graphs for processor interconnection.
*IEEE Transactions on Computers 33*(1984), 190–194.Google Scholar - [40]Knuth, D. E.
*Fundamental Algorithms*, vol. 1 of*The Art of Computer Programming*. Addison-Wesley, 1968. (2nd Edition, 1973).Google Scholar - [41]Knuth, D. E.
*Seminumerical Algorithms*, vol. 2 of*The Art of Computer Programming*. Addison-Wesley, 1969. (2nd Edition, 1981).Google Scholar - [42]Knuth, D. E.
*Sorting and Searching*, vol. 3 of*The Art of Computer Programming*. Addison-Wesley, 1973.Google Scholar - [43]Kosaraju, S. R. Parallel evaluation of division-free arithmetic expressions. In
*Proc. 18th Annual ACM Symposium on Theory of Computing*(1986), pp. 231–239.Google Scholar - [44]Kosaraju, S. R., and Atallah, M. J. Optimal simulations between mesh-connected arrays of processors. In
*Proc. 18th Annual ACM Symposium on Theory of Computing*(1986), pp. 264–272.Google Scholar - [45]
- [46]Kung, H. T. Memory requirements for balanced computer architectures.
*Journal of Complexity 1*, 1 (Oct. 1985), 147–157.Google Scholar - [47]
- [48]Leighton, F. T.
*Complexity Issues in VLSI: Optimal Layouts for the Shuffle-Exchange Graph and Other Networks*. MIT Press, 1983.Google Scholar - [49]Maggs, B. M., and Plotkin, S. A. Minimum-cost spanning tree as a path-finding problem.
*Information Processing Letters 26*(1988), 291–293.Google Scholar - [50]Miller, R., and Stout, Q. F. Data movement techniques for the pyramid computer.
*SIAM Journal on Computing 16*, 1 (1987), 38–60.Google Scholar - [51]
- [52]Muller, D. E., and Preparata, F. P. Bounds to complexities of networks for sorting and switching.
*Journal of the ACM 22*, 2 (1975), 195–201.Google Scholar - [53]Munro, I. J., and Paterson, M. S. Optimal algorithms for parallel polynomial evaluation.
*Journal of Computer and System Sciences*(1973), 189–198.Google Scholar - [54]Pan, V.
*How to Multiply Matrices Faster*, vol. 179 of*Lecture Notes in Computer Science*. Springer-Verlag, 1984.Google Scholar - [55]
- [56]Pippenger, N. J. Parallel communication with limited buffers. In
*Proc. 25th Annual IEEE Symposium on Foundations of Computer Science*(1984), pp. 127–136.Google Scholar - [57]Preparata, F. P. Optimal three-dimensional VLSI layouts.
*Mathematical Systems Theory 16*(1983), 1–8.Google Scholar - [58]Preparata, F. P., and Vuillemin, J. The Cube-Connected Cycles: A versatile network for parallel computation.
*Communications of the ACM 24*, 5 (1981), 300–309.Google Scholar - [59]Purdom Jr., P. W., and Brown, C. A.
*The Analysis of Algorithms*. Holt, Rinehart and Winston, 1985.Google Scholar - [60]Quinton, P. The systematic design of systolic arrays. In
*Automata Networks in Computer Science Theory and Applications*(1987), F. F. Soulie, Y. robert, and M. Tchuente, Eds., Manchester University Press, pp. 229–260.Google Scholar - [61]Ranade, A. G. How to emulate shared memory. In
*Proc. 28th Annual IEEE Symposium on Foundations of Computer Science*(1987), pp. 185–194.Google Scholar - [62]Robert, Y., and Trystram, D. Systolic solution of the algebraic path problem. In
*Systolic Arrays*(1986), W. Moore, A. McCabe, and R. Urquhart, Eds., Adam Hilger, pp. 171–180.Google Scholar - [63]Rote, G. A systolic array algorithm for the algebraic path problem (shortest paths; matrix inversion).
*Computing 34*(1985), 191–219.Google Scholar - [64]Sankoff, D., and Kruskal, J. B., Eds.
*Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison*. Addison-Wesley, 1983.Google Scholar - [65]Savage, J. E. Planar circuit complexity and the performance of VLSI algorithms. In
*VLSI Systems and Computations*(1981), H. T. Kung, B. Sproull, and G. Steele, Eds., Computer Science Press, pp. 61–68. (Expanded version appears as INRIA Report No.77 (1981).).Google Scholar - [66]Schnorr, C. P., and Shamir, A. An optimal sorting algorithm for mesh connected computers. In
*Proc. 18th Annual ACM Symposium on Theory of Computing*(1986), pp. 255–263.Google Scholar - [67]
- [68]Strassen, V. Gaussian elimination is not optimal.
*Numerische Mathematik 13*(1969), 354–356.Google Scholar - [69]
- [70]
- [71]
- [72]Upfal, E., and Wigderson, A. How to share memory in a distributed system. In
*Proc. 25th Annual IEEE Symposium on Foundations of Computer Science*(1984), pp. 171–180.Google Scholar - [73]Valiant, L. G. A scheme for fast parallel communication.
*SIAM Journal on Computing 11*, 2 (1982), 350–361.Google Scholar - [74]Valiant, L. G. General purpose parallel architectures. In
*Handbook of Theoretical Computer Science*(To appear), J. van Leeuwen, Ed., North Holland.Google Scholar - [75]Valiant, L. G., and Brebner, G. J. Universal schemes for parallel communication. In
*Proc. 13th Annual ACM Symposium on Theory of Computing*(1981), pp. 263–277.Google Scholar - [76]
- [77]