Parallel algorithms and architectures

  • W. F. McColl
Parallel Programming Techniques
Part of the Lecture Notes in Computer Science book series (LNCS, volume 384)


In this paper we consider some of the central issues involved in the design of parallel algorithms. We describe several efficient algorithms for idealised shared memory architectures and draw some conclusions as to what would be required to implement them on a realistic physical architecture, i.e. one with distributed memory. We also describe some systolic algorithms for matrix computations, sequence comparison and molecular modelling, and briefly discuss their implementation on arrays of transputers. In the final section we discuss the question of whether the current preoccupation with architectural details in parallel algorithm design is likely to persist. We briefly describe some techniques which show that a physically realistic general purpose parallel architecture based on distributed memory can be constructed which will execute any shared memory parallel algorithm with no significant overhead due to communication. We thus have the attractive prospect in the very near future of architectural independence in parallel algorithm design.


algorithms complexity computer architecture molecular modelling parallel computation routing sequence comparison systolic algorithms 


  1. [1]
    Aho, A. V., Hopcroft, J. E., and Ullman, J. D.The Design and Analysis of Computer Algorithms. Addison-Wesley, 1974.Google Scholar
  2. [2]
    Akl, S. G.Parallel Sorting Algorithms. Academic Press, 1985.Google Scholar
  3. [3]
    Akl, S. G.The Design and Analysis of Parallel Algorithms. Prentice Hall, 1989.Google Scholar
  4. [4]
    Alt, H. Comparing the combinational complexities of arithmetic functions. Journal of the ACM 35, 2 (Apr. 1988), 447–460.Google Scholar
  5. [5]
    Atallah, M. J., and Hambrusch, S. E. Solving tree problems on a mesh-connected processor array. In Proc. 26th Annual IEEE Symposium on Foundations of Computer Science (1985), pp. 222–231.Google Scholar
  6. [6]
    Atallah, M. J., and Kosaraju, S. R. Graph problems on a mesh-connected processor array. In Proc. 14th Annual ACM Symposium on Theory of Computing (1982), pp. 345–353.Google Scholar
  7. [7]
    Batcher, K. E. Sorting networks and their applications. In Proc. AFIPS Spring Joint Computer Conference (1968), pp. 307–314.Google Scholar
  8. [8]
    Berkowitz, S. J. On computing the determinant in small parallel time using a small number of processors. Information Processing Letters 18 (1984), 147–150.Google Scholar
  9. [9]
    Bertsekas, D. P., and Tsitsiklis, J. N.Parallel and Distributed Computation — Numerical Methods. Prentice Hall, 1989.Google Scholar
  10. [10]
    Biggs, N.Algebraic Graph Theory. Cambridge University Press, 1974.Google Scholar
  11. [11]
    Blelloch, G. Scans as primitive parallel operations. In Proc. International Conference on Parallel Processing (1987).Google Scholar
  12. [12]
    Borodin, A., and Munro, I. J.The Computational Complexity of Algebraic and Numeric Problems. Theory of Computation Series. American Elsevier, 1975.Google Scholar
  13. [13]
    Brassard, G., and Bratley, P.Algorithmics — theory and Practice. Prentice Hall, 1988.Google Scholar
  14. [14]
    Brent, R. P. The parallel evaluation of general arithmetic expressions. Journal of the ACM 21 (1974), 201–206.Google Scholar
  15. [15]
    Cappello, P. R., and Steiglitz, K. Unifying VLSI array design with linear transformations of space-time. Advances in Computing Research 2 (1984), 23–65. Jai Press Inc.Google Scholar
  16. [16]
    Chandy, K. M., and Misra, J.Parallel Program Design: A Foundation. Addison-Wesley, 1988.Google Scholar
  17. [17]
    Cook, S. A. An overview of computational complexity. Communications of the ACM 26, 6 (1983), 400–408.Google Scholar
  18. [18]
    Cook, S. A. A taxonomy of problems with fast parallel algorithms. Information and Control 64, (1–3) (1985), 2–22.Google Scholar
  19. [19]
    Cook, S. A., and Dwork, C. Bounds on the time for parallel RAM's to compute simple functions. In Proc. 14th Annual ACM Symposium on Theory of Computing (1982), pp. 231–233.Google Scholar
  20. [20]
    Coppersmith, D., and Winograd, S. Matrix multiplication via arithmetic progressions. In Proc. 19th Annual ACM Symposium on Theory of Computing (1987), pp. 1–6.Google Scholar
  21. [21]
    Csanky, L. Fast parallel matrix inversion algorithms. SIAM Journal on Computing 5 (1976), 618–623.Google Scholar
  22. [22]
    Date, C. J.An Introduction to Database Systems, fourth ed., vol. 1 of Systems Programming Series. Addison-Wesley, 1986.Google Scholar
  23. [23]
    Dunne, P. E.The Complexity of Boolean Networks, vol. 29 of A.P.I.C. Studies in Data Processing. Academic Press, 1988.Google Scholar
  24. [24]
    Fiat, A., and Shamir, A. Polymorphic arrays: A novel VLSI layout for systolic computers. In Proc. 25th Annual IEEE Symposium on Foundations of Computer Science (1984), pp. 37–45.Google Scholar
  25. [25]
    Floyd, R. W. Algorithm 97: Shortest path. Communications of the ACM 5, 6 (1962), 345.Google Scholar
  26. [26]
    Fox, G. C., Johnson, M. A., Lyzenga, G. A., Otto, S. W., Salmon, J. K., and Walker, D. W.Solving Problems on Concurrent Processors: Volume 1. General Techniques and Regular Problems. Prentice Hall, 1988.Google Scholar
  27. [27]
    Gallager, R. G., Humblet, P. A., and Spira, P. M. A distributed algorithm for minimum-weight spanning trees. ACM Transactions on Programming Languages and Systems 5 (1983), 66–77.Google Scholar
  28. [28]
    Gathen, von zur, J. Parallel arithmetic computations: A survey. In Proc. Mathematical Foundations of Computer Science1986, LNCS Vol. 233 (1986), Springer-Verlag, pp. 93–112.Google Scholar
  29. [29]
    Gibbons, A. M., and Rytter, W.Efficient Parallel Algorithms. Cambridge University Press, 1988.Google Scholar
  30. [30]
    Greenberg, A. C., Ladner, R. E., Paterson, M. S., and Galil, Z. Efficient parallel algorithms for linear recurrence computation. Information Processing Letters 15, 1 (Aug. 1982), 31–35.Google Scholar
  31. [31]
    Guibas, L. J., Kung, H. T., and Thompson, C. D. Direct VLSI implementation of combinatorial algorithms. In Proc. Caltech Conference on VLSI (1979), C. Seitz, Ed., pp. 509–525.Google Scholar
  32. [32]
    Gupta, A. K., and Hambrusch, S. E. Optimal three-dimensional layouts of complete binary trees. Information Processing Letters 26 (1987), 99–104.Google Scholar
  33. [33]
    Heath, M. T., Ed. Hypercube Multiprocessors 1986. SIAM, Philadelphia, 1986.Google Scholar
  34. [34]
    Hehre, W. J., Radom, L., v.R Schleyer, P., and Pople, J. A.Ab Initio Molecular Orbital Theory. John Wiley and Sons, 1986.Google Scholar
  35. [35]
    Hennessy, M. Proving systolic systems correct. ACM Transactions on Programming Languages and Systems 8, 3 (1986), 344–387.Google Scholar
  36. [36]
    Hillis, W. D.The Connection Machine. MIT Press, 1985.Google Scholar
  37. [37]
    Hoare, C. A. R.Communicating Sequential Processes. Prentice Hall, 1985.Google Scholar
  38. [38]
    Horowitz, E., and Sahni, S.Fundamentals of Computer Algorithms. Pitman, 1978.Google Scholar
  39. [39]
    Jerrum, M. R., and Skyum, S. Families of fixed degree graphs for processor interconnection. IEEE Transactions on Computers 33 (1984), 190–194.Google Scholar
  40. [40]
    Knuth, D. E.Fundamental Algorithms, vol. 1 of The Art of Computer Programming. Addison-Wesley, 1968. (2nd Edition, 1973).Google Scholar
  41. [41]
    Knuth, D. E.Seminumerical Algorithms, vol. 2 of The Art of Computer Programming. Addison-Wesley, 1969. (2nd Edition, 1981).Google Scholar
  42. [42]
    Knuth, D. E.Sorting and Searching, vol. 3 of The Art of Computer Programming. Addison-Wesley, 1973.Google Scholar
  43. [43]
    Kosaraju, S. R. Parallel evaluation of division-free arithmetic expressions. In Proc. 18th Annual ACM Symposium on Theory of Computing (1986), pp. 231–239.Google Scholar
  44. [44]
    Kosaraju, S. R., and Atallah, M. J. Optimal simulations between mesh-connected arrays of processors. In Proc. 18th Annual ACM Symposium on Theory of Computing (1986), pp. 264–272.Google Scholar
  45. [45]
    Kung, H. T. Why systolic architectures? IEEE Computer 15, 1 (Jan. 1982), 37–46.Google Scholar
  46. [46]
    Kung, H. T. Memory requirements for balanced computer architectures. Journal of Complexity 1, 1 (Oct. 1985), 147–157.Google Scholar
  47. [47]
    Kung, S. Y.VLSI Array Processors. Prentice Hall, 1988.Google Scholar
  48. [48]
    Leighton, F. T.Complexity Issues in VLSI: Optimal Layouts for the Shuffle-Exchange Graph and Other Networks. MIT Press, 1983.Google Scholar
  49. [49]
    Maggs, B. M., and Plotkin, S. A. Minimum-cost spanning tree as a path-finding problem. Information Processing Letters 26 (1988), 291–293.Google Scholar
  50. [50]
    Miller, R., and Stout, Q. F. Data movement techniques for the pyramid computer. SIAM Journal on Computing 16, 1 (1987), 38–60.Google Scholar
  51. [51]
    Moore, W., McCabe, A., and Urquhart, R., Eds. Systolic Arrays. Adam Hilger, 1987.Google Scholar
  52. [52]
    Muller, D. E., and Preparata, F. P. Bounds to complexities of networks for sorting and switching. Journal of the ACM 22, 2 (1975), 195–201.Google Scholar
  53. [53]
    Munro, I. J., and Paterson, M. S. Optimal algorithms for parallel polynomial evaluation. Journal of Computer and System Sciences (1973), 189–198.Google Scholar
  54. [54]
    Pan, V.How to Multiply Matrices Faster, vol. 179 of Lecture Notes in Computer Science. Springer-Verlag, 1984.Google Scholar
  55. [55]
    Parberry, I.Parallel Complexity Theory. Pitman, 1987.Google Scholar
  56. [56]
    Pippenger, N. J. Parallel communication with limited buffers. In Proc. 25th Annual IEEE Symposium on Foundations of Computer Science (1984), pp. 127–136.Google Scholar
  57. [57]
    Preparata, F. P. Optimal three-dimensional VLSI layouts. Mathematical Systems Theory 16 (1983), 1–8.Google Scholar
  58. [58]
    Preparata, F. P., and Vuillemin, J. The Cube-Connected Cycles: A versatile network for parallel computation. Communications of the ACM 24, 5 (1981), 300–309.Google Scholar
  59. [59]
    Purdom Jr., P. W., and Brown, C. A.The Analysis of Algorithms. Holt, Rinehart and Winston, 1985.Google Scholar
  60. [60]
    Quinton, P. The systematic design of systolic arrays. In Automata Networks in Computer Science Theory and Applications (1987), F. F. Soulie, Y. robert, and M. Tchuente, Eds., Manchester University Press, pp. 229–260.Google Scholar
  61. [61]
    Ranade, A. G. How to emulate shared memory. In Proc. 28th Annual IEEE Symposium on Foundations of Computer Science (1987), pp. 185–194.Google Scholar
  62. [62]
    Robert, Y., and Trystram, D. Systolic solution of the algebraic path problem. In Systolic Arrays (1986), W. Moore, A. McCabe, and R. Urquhart, Eds., Adam Hilger, pp. 171–180.Google Scholar
  63. [63]
    Rote, G. A systolic array algorithm for the algebraic path problem (shortest paths; matrix inversion). Computing 34 (1985), 191–219.Google Scholar
  64. [64]
    Sankoff, D., and Kruskal, J. B., Eds. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, 1983.Google Scholar
  65. [65]
    Savage, J. E. Planar circuit complexity and the performance of VLSI algorithms. In VLSI Systems and Computations (1981), H. T. Kung, B. Sproull, and G. Steele, Eds., Computer Science Press, pp. 61–68. (Expanded version appears as INRIA Report No.77 (1981).).Google Scholar
  66. [66]
    Schnorr, C. P., and Shamir, A. An optimal sorting algorithm for mesh connected computers. In Proc. 18th Annual ACM Symposium on Theory of Computing (1986), pp. 255–263.Google Scholar
  67. [67]
    Sedgewick, R.Algorithms, second ed. Addison-Wesley, 1988.Google Scholar
  68. [68]
    Strassen, V. Gaussian elimination is not optimal. Numerische Mathematik 13 (1969), 354–356.Google Scholar
  69. [69]
    Tarjan, R. E.Data Structures and Network Algorithms. SIAM, 1983.Google Scholar
  70. [70]
    Ullman, J. D.Principles of Database Systems, second ed. Pitman, 1982.Google Scholar
  71. [71]
    Ullman, J. D.Computational Aspects of VLSI. Computer Science Press, 1984.Google Scholar
  72. [72]
    Upfal, E., and Wigderson, A. How to share memory in a distributed system. In Proc. 25th Annual IEEE Symposium on Foundations of Computer Science (1984), pp. 171–180.Google Scholar
  73. [73]
    Valiant, L. G. A scheme for fast parallel communication. SIAM Journal on Computing 11, 2 (1982), 350–361.Google Scholar
  74. [74]
    Valiant, L. G. General purpose parallel architectures. In Handbook of Theoretical Computer Science (To appear), J. van Leeuwen, Ed., North Holland.Google Scholar
  75. [75]
    Valiant, L. G., and Brebner, G. J. Universal schemes for parallel communication. In Proc. 13th Annual ACM Symposium on Theory of Computing (1981), pp. 263–277.Google Scholar
  76. [76]
    Warshall, S. A theorem on Boolean matrices. Journal of the ACM 9, 1 (1962), 11–12.Google Scholar
  77. [77]
    Wegener, I.The Complexity of Boolean Functions. John Wiley and Sons, 1987.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1989

Authors and Affiliations

  • W. F. McColl
    • 1
  1. 1.Programming Research GroupOxford UniversityOxfordEngland

Personalised recommendations