# Ensemble Architectures and their Algorithms: An Overview

## Abstract

During recent years the number of commercially available parallel computer architectures have increased dramatically. The number of processors in these systems vary from a few up to 64*k* processors for the Connection Machine. In this paper we discuss some of the technology issues that are the underlying driving force, and focus on a particular class of parallel computer architectures often called Ensemble Architectures. They are interesting candidates for future high performance computing systems. The ensemble configurations discussed here are linear arrays, 2-dimensional arrays, binary trees, shuffle-exchange networks, Boolean cubes and cube connected cycles. We discuss a few algorithms for arbitrary data permutations, and some particular data permutation and distribution algorithms used in standard matrix computations. Special attention is given to data routing. Distributed routing algorithms in which elements with distinct origin and distinct destinations do not traverse the same communications link make possible a maximum degree of pipelined communications. The linear algebra computations discussed are: matrix transposition, matrix multiplication, dense and general banded systems solvers, linear recurrence solvers, tridiagonal system solvers, fast Poisson solvers, and very briefly, iterative methods.

## Keywords

Binary Tree Gaussian Elimination Computation Graph Systolic Array Gray Code## Preview

Unable to display preview. Download preview PDF.

## References

- [1]Loyce Adams.
*Iterative Algorithms for Large Sparse Linear Systems on Parallel Computers*. Technical Report 166027, NASA Langley Research Center, 1982.Google Scholar - [2]Hassan M. Ahmed, Jean-Marc Delosme, and Martin Morf. Highly concurrent computing structures for matrix arithmetic and signal processing.
*Computer*, 15:65–82, January 1982.CrossRefGoogle Scholar - [3]R. Aleliunas. Randomized parallel computation. In
*ACM Symposium on Principles of Distributed Computing*, pages 60–72, ACM, 1982.Google Scholar - [4]A.V. Aho A.V., John E. Hopcroft, and Jeffrey D. Ullman.
*Data Structures and Algorithms*. Addison-Wesley, 1983.MATHGoogle Scholar - [5]Kenneth E. Batcher. Sorting networks and their applications. In
*Spring Joint Computer Conference*, pages 307–314, IEEE, 1968.Google Scholar - [6]Gerald M. Baudet. Asynchronous iterative methods for multiprocessors.
*J. ACM*, 25(2):226–244, 1978.CrossRefGoogle Scholar - [7]Sandeep N. Bhatt, F.R.K. Chung, F. Tom Leighton, and Arnold L. Rosenberg.
*Optimal Embeddings of Binary Trees in the Boolean Hypercube*. Technical Report YALEU/CSD/RR-, Yale University, Dept. of Computer Science, December 1985.Google Scholar - [8]Sandeep N. Bhatt and Ilse I.F. Ipsen.
*How to Embed Trees in Hypercubes*. Technical Report YALEU/CSD/RR-443, Yale University, Dept. of Computer Science, December 1985.Google Scholar - [9]Sandeep N. Bhatt and Charles E. Leiserson.
*Minimizing the Longest Edge in a VLSI Layout*. Technical Report MIT VLSI Memo 82–86, MIT, 1982.Google Scholar - [10]Smith B.J. Architecture and applications of the hep multiprocessor computer system. In
*Real-Time Signal Processing IV, Proc of SPIE*, pages 241–248, 1981.Google Scholar - [11]Richard P. Brent and H.T. Kung. On the area of binary tree layouts.
*Information Processing Letters*, 11(1):44–46, 1980.MathSciNetCrossRefGoogle Scholar - [12]Sally A. Browning.
*The Tree Machine: A Highly Concurrent Computing Environment*. Technical Report 1980:TR:3760, Computer Science, California Institute of Technology, January 1980.Google Scholar - [13]P. Budnik and David J. Kuck. The organisation and use of parallel memories.
*IEEE Trans. Computer*, C-20:1566–1569, December 1971.CrossRefGoogle Scholar - [14]Billy L. Buzbee. A fast poisson solver amenable to parallel computation.
*IEEE Trans. Computers*, C-22:793–796, 1973.CrossRefGoogle Scholar - [15]Billy L. Buzbee, Gene H. Golub, and C W. Nielson. On direct methods for solving poisson’s equations.
*SIAM J. Numer. Anal*., 7(4):627–656, December 1970.MathSciNetMATHCrossRefGoogle Scholar - [16]L.E. Cannon.
*A Cellular Computer to Implement the Kaiman Filter Algorithm*. PhD thesis, Montana State University, 1969.Google Scholar - [17]Peter R. Capello and Kenneth Steiglitz.
*Unifying VLSI Array Design with Linear Transformations of Space-Time*. Technical Report TRCS83–03, UC Santa Barbara, Dept of Computer Science, May 1982.Google Scholar - [18]D. Chazan and Willard L. Miranker. Chaotic relaxation.
*Linear Algebra and its Applications*, 2:199–222, 1969.MathSciNetMATHCrossRefGoogle Scholar - [19]Marina C. Chen. Synthesizing systolic designs. In
*2nd InternationalSymposium on VLSI Technology, Systems, And Applications*, IEEE Computer Society, 1985.Google Scholar - [20]W. Crowther, J. Goodhue, E. Starr, R. Thomas, W. Milliken, and T. Blackadar. Performance measurements on a 128-node butterfly parallel processor. In
*Proceedings of the 1985 International Conference on Parallel Processing*, pages 531–540, IEEE Computer Society, 1985.Google Scholar - [21]E. Dekel, D. Nassimi, and Sartaj Sahni. Parallel matrix and graph algorithms.
*SIAM J. Computing*, 10:657–673, 1981.MathSciNetMATHCrossRefGoogle Scholar - [22]Jean-Marc Delosme and Ilse C.P. Ipsen. An illustration of a methodology for the construction of efficient systolic architecture in vlsi. In
*2nd IntemationalSymposium on VLSI Technology, Systems, And Applications*, IEEE Computer Society, 1985.Google Scholar - [23]R.A. DeMillod, Stanley C. Eisenstat, and Richard J. Lipton. Preserving average proximity in arrays.
*Communicationsof the ACM*, 21:228–231, March 1978.CrossRefGoogle Scholar - [24]Jack Dongarra and S. Lennart Johnsson. Solving banded systems on a parallel processor.
*Parallel Computing*, 1987. Presented at International Conference on Vector and Parallel Computing, 1986.Google Scholar - [25]Jack J. Dongarra and Ahmed H. Sameh.
*On Some Parallel Bandded System Solvers*. Technical Report ANL/MCS-TM-27, Argonne National Laboratories, 1984.Google Scholar - [26]J.O. Eklundh. A fast computer method for matrix transposing.
*IEEE Trans. Computers*, C-21(7):801–803, 1972.MathSciNetCrossRefGoogle Scholar - [27]Michael J. Fischer.
*Efficiency of Equivalence Algorithms*, pages 153–167. Plenum Press, 1972.Google Scholar - [28]Dennis Gannon and John Van Rosendale. On the impact of communication complexity in the design of parallel numerical algorithms.
*IEEE Trans. Computers*, C-33(12):1180–1194, December 1984.CrossRefGoogle Scholar - [29]W. Morven Gentleman. Some complexity results for matrix computations on parallel processors.
*J. ACM*, 25(1):112–115, January 1978.MathSciNetMATHCrossRefGoogle Scholar - [30]W. Morven Gentleman and H.T. Kung. Matrix triangularization by systolic arrays. In
*Real-Time Signal Processing IV, Proc. of SPIE*, pages 19–26, SPIE, 1981.Google Scholar - [31]Allan Gottlieb, R. Grishman, Clyde P. Kruskal, K.P. McAuliffe, L. Rudolph, and M. Snir. The nyu ultracomputer — designing an mimd shared memory parallel computer.
*IEEE Trans. Computers*, C-32(2):175–189, 1983.CrossRefGoogle Scholar - [32]J.W. Greene and A. El Gammal. Area and delay penalties in restructurable wafer-scale arrays. In
*Third Caltech Conference on VLSI*, pages 165–184, Computer Sciences Press, 1983.CrossRefGoogle Scholar - [33]Donald E. Heller and Use C.P. Ipsen. Systolic networks for orthogonal equivalence transformations and their applications. In P. Penfield Jr, editor,
*Proceedings, Advanced Research in VLSI*, pages 113–122, Artech House, 1982.Google Scholar - [34]John L. Hennessey, N. Jouppi, Forrest Baskett, and J. Gill. Mips: a vlsi processor architecture. In
*VLSI Systems and Computations*, pages 337–346, Computer Sciences Press, 1981.CrossRefGoogle Scholar - [35]John L. Hennessey, N. Jouppi, S. Przybylski, and C. Rowen. Design of a high performance vlsi processor. In
*Proc. of the Third Caltech Conference on VLSI*, pages 33–54, Computer Sciences Press, 1983.Google Scholar - [36]M.R. Hestenes and E. Stiefel. Methods of conjugate gradient for solution of linear systems.
*J. Res. Nat. Bur. Standards*, 49:409–436, 1952.MathSciNetMATHGoogle Scholar - [37]W. Daniel Hillis.
*The Connection Machine*. MIT Press, 1985.Google Scholar - [38]W. Daniel Hillis.
*The Connection Machine*. Technical Report Memo 646, MIT Artificial Intelligence Laboratory, 1981.Google Scholar - [39]W. Daniel Hillis and Guy L. Steel. Data parallel algorithms.
*Communications of the CACM*, 29:1170–1183, December 1986.Google Scholar - [40]Daniel S. Hirschberg. Fast parallel sorting algorithms.
*Communications of the ACM*, 21(8):657–661, 1978.CrossRefGoogle Scholar - [41]Ching-Tien Ho and S. Lennart Johnsson.
*Matrix Transposition on Boolean n-cube Configured Ensemble Architectures*. Technical Report YALEU/CSD/RR-494, Yale University, Dept. of Computer Science, September 1986.Google Scholar - [42]Ching-Tien Ho and S. Lennart Johnsson. On the embedding of arbitrary meshes in boolean cubes with expansion two dilation two. In
*Int. Conf.on Parallel Processing*, IEEE Computer Society, 1987.Google Scholar - [43]Ching-Tien Ho and S. Lennart Johnsson.
*On the Embedding of Meshes in Boolean Cubes*. Technical Report YALEU/CSD/RR-, Yale University, Dept. of Computer Science, In preparation 1986.Google Scholar - [44]Ching-Tien Ho and S. Lennart Johnsson.
*Spanning Graphs for Optimum Broadcasting and Personalized Communication in Hypercubes*. Technical Report YALEU/CSD/RR-500, Yale University, Dept. of Computer Science, November 1986.Google Scholar - [45]Roger W. Hockney. A fast direct solution of pois son’s equation using fourier analysis.
*J. ACM*, 12:95–113, 1965.CrossRefGoogle Scholar - [46]Roger W. Hockney. The potential calculation and some applications.
*Methods Comput. Phys*., 9:135–211, 1970.Google Scholar - [47]Roger W. Hockney and C.R. Jesshope.
*Parallel Computers*. Adam Hilger, 1981.MATHGoogle Scholar - [48]Briggs F.A. Hwang K., editor.
*Computer Architecture and Parallel Processing*. McGraw-Hill, 1984.MATHGoogle Scholar - [49]S. Lennart Johnsson. Combining parallel and sequential sorting on a boolean n-cube. In
*International Conference on Parallel Processing*, pages 444–448, IEEE Computer Society, 1984. Presented at the 1984 Conf. on Vector and Parallel Processors in Computational Science II.Google Scholar - [50]S. Lennart Johnsson. Communication efficient basic linear algebra computations on hypercube architectures.
*Journal of Parallel and Distributed Computing*, 4(2):133–172, April 1987. Report YALEU/CSD/RR-361, January 1985, Dept. of Computer Science, Yale University.MathSciNetCrossRefGoogle Scholar - [51]S. Lennart Johnsson. A computational array for the qr-method. In Jr. P. Penfield, editor,
*Proc, Conf. on Advanced Research in VLSI*, pages 123–129, Artech House, January 1982.Google Scholar - [52]S. Lennart Johnsson.
*Computational Arrays for Band Matrix Equations*. Technical Report 4287:TR:81, Computer Science, California Institute of Technology, May 1981.Google Scholar - [53]S. Lennart Johnsson.
*Data Permutations and Basic Linear Algebra Computations on Ensemble Architectures*. Technical Report YALEU/CSD/RR-367, Yale University, Dept. of Computer Science, February 1985.Google Scholar - [54]S. Lennart Johnsson. Dense matrix operations on a torus and a boolean cube. In
*The National Computer Conference*, July 1985.Google Scholar - [55]S. Lennart Johnsson.
*Fast Banded Systems Solvers for Ensemble Architectures*. Technical Report YALEU/CSD/RR-379, Department of Computer Science, Yale University, March 1985.Google Scholar - [56]S. Lennart Johnsson. Fast pde solvers on fine and medium grain architectures. In
*Int. Assoc. for Mathematics and Computers in Simulation*, page, IMACS, 1987.Google Scholar - [57]S. Lennart Johnsson. Highly concurrent algorithms for solving linear systems of equations. In
*Elliptic Problem Solving II*, Academic Press, 1983.Google Scholar - [58]S. Lennart Johnsson.
*Odd-Even Cyclic Reduction on Ensemble Architectures and the Solution Tridiagonal Systems of Equations*. Technical Report YALE/CSD/RR-339, Department of Computer Science, Yale University, October 1984.Google Scholar - [59]S. Lennart Johnsson. Pipelined linear equation solvers and vlsi. In
*Microelectronics ‘82*, pages 42–46, Institution of Electrical Engineers, Australia, May 1982.Google Scholar - [60]S. Lennart Johnsson. Solving narrow banded systems on ensemble architectures.
*ACM TOMS*, 11(3):271–288, November 1985. Also available as Report YALEU/CSD/RR-418, November 1984.Google Scholar - [61]S. Lennart Johnsson.
*Solving Narrow Banded Systems on Ensemble Architectures*. Technical Report YALEU/CSD/RR-343, Dept. of Computer Science, Yale University, November 1984.Google Scholar - [62]S. Lennart Johnsson. Solving tridiagonal systems on ensemble architectures.
*SIAM J. Sci. Stat. Comp*., 8(3):354–392, May 1987. Report YALEU/CSD/RR-436, November 1985.MathSciNetGoogle Scholar - [63]S. Lennart Johnsson. Vlsi algorithms for doolittle’s, crout’s and cholesky’s methods. In
*International Conference on Circuits and Computers 1982, ICCC82*, pages 372–377, IEEE, Computer Society, September 1982.Google Scholar - [64]S. Lennart Johnsson and Danny Cohen. An algebraic description of array implementations of fft algorithms. In
*20th Allerton Conference on Communication, Control, and Computing*, Electrical Engineering, University of Illinois, Urbana/Champaign, 1982.Google Scholar - [65]S. Lennart Johnsson and Danny Cohen.
*Mathematical Approach to Computational Networks for the Discrete Fourier Transform*. Technical Report, Department of Computer Science, Yale University, 1984.Google Scholar - [66]S. Lennart Johnsson and Ching-Tien Ho. Matrix multiplication on boolean cubes using generic communication primitives. In
*Parallel Processing and Medium Scale Multiprocessors*, SIAM, 1987. YALEU/CSD/RR-530.Google Scholar - [67]S. Lennart Johnsson, Ching-Tien Ho, and Faisal Saied.
*Multiple tridiagonal systems, the Alternating Direction Method, and Boolean cube configured multiprocessors*. Technical Report, Yale University, In preparation 1987.Google Scholar - [68]S. Lennart Johnsson, Uri Weiser, Danny Cohen, and Al Davis. Towards a formal treatment of vlsi arrays. In
*Proceedings of the Second Caltech Conference on VLSI*, pages 375 – 398, Caltech Computer Science Department, January 1981.Google Scholar - [69]Hwang K., editor.
*Supercomputers: Design and Applications*. IEEE Computer Society, 1984.Google Scholar - [70]C. Kamath and Ahmed H. Sameh.
*The Preconditioned Conjugate Gradient Method on a Multiprocessor*. Technical Report ANL/MCS-TM-28, Argonne National Laboratories, Mathematics and Computer Science Division, 1984.Google Scholar - [71]M.G.H. Katevenis.
*Reduced Instruction Set Computer Architectures for VLSI*. The MIT Press, 1985.Google Scholar - [72]D. Kershaw.
*Solution of Single Tridiagonal Linear Systems and the Vectorization of the ICCG Algorithm on the CRAY-1*, pages 85–92. Academic Press, 1982.Google Scholar - [73]S.C. Knauer, J.H. O’Neill, and A. Huang.
*Self-routing Switching Network*, pages 424–448. Addison-Wesley, 1985.Google Scholar - [74]Donald E. Knuth.
*The Art of Computer Programming, Vol. 3: Sorting and Searching*. Addison-Wesley, 1973.Google Scholar - [75]P.M. Kogge and Harold S. Stone. A parallel algorithm for the efficient solution of a general class of recurrence equations.
*IEEE Trans. Computers*, C-22(8):786–792, 1973.MathSciNetCrossRefGoogle Scholar - [76]David J. Kuck.
*The Structure of Computers and Computations*. John Wiley, 1978.Google Scholar - [77]David J. Kuck. A survey of parallel machine organization and programming.
*ACM Computing Surveys*, 9(1):29–59, 1977.MathSciNetMATHCrossRefGoogle Scholar - [78]David J. Kuck, Duncan H. Lawrie, R. Cytron, Ahmed Sameh, and Daniel D. Gajski.
*The Architecture and the Programming of the Cedar System*. Technical Report, Laboratory for Advanced Supercomputers, Dept. of Computer Science, University of Illinois, August 1983.Google Scholar - [79]M. Kumar and Daniel S. Hirschberg. An efficient implementation of batcher’s bitonic odd-even merge algorithm and its application in parallel sorting schemes.
*IEEE Trans. Computers*, C-32(3):254–264, 1983.CrossRefGoogle Scholar - [80]H.T. Kung and Charles E. Leiserson.
*Algorithms for VLSI Processor Arrays*, pages 271–292. Addison-Wesley, 1980.Google Scholar - [81]Snyder L. Introduction to the configurable highly parallel computer.
*Computer*, 15(1):47–56, 1982.CrossRefGoogle Scholar - [82]Duncan H. Lawrie. Access and alignment of data in an array processor.
*IEEE Trans, on Computers*, C-24(12):99–109, 1975.Google Scholar - [83]Duncan H. Lawrie and Ahmed H. Sameh. The computational and communication complexity of a parallel banded system solver.
*ACM TOMS*, 10(2):185–195, June 1984.MathSciNetMATHCrossRefGoogle Scholar - [84]Duncan H. Lawrie and C.R. Vora. The prime memory system for array access.
*IEEE Trans. Computer*, C-31:1435–442, May 1982.CrossRefGoogle Scholar - [85]F. Tom Leighton.
*Complexity Issues in VLSI: Optimal Layouts for the Shuffle-Exchange Graph and Other Networks*. MIT Press, 1983.Google Scholar - [86]F. Tom Leighton and Charles E. Leiserson. Wafer-scale integration of systolic arrays.
*IEEE Trans. Comp*., C-34(5):448–461, May 1985.CrossRefGoogle Scholar - [87]Charles E. Leiserson.
*Area-Efficient VLSI Computation*. MIT Press, 1982.Google Scholar - [88]Charles E. Leiserson. Fat-trees: universal networks for hardware-efficient supercomputing.
*IEEE Trans. Computers*, C-34:892–901, October 1985.Google Scholar - [89]Li.-J. Li and Benjamin W. Wah. The design of optimal systolic arrays.
*IEEE Trans. Computers*, C-34:66–77, 1985.CrossRefGoogle Scholar - [90]Peggy Li and Lennart Johnsson. The tree machine: an evaluation of program loading strategies. In
*1983 International Conference on Parallel Processing*, pages 202 – 205, IEEE Computer Society, August 1983.Google Scholar - [91]Bjorn Lisper.
*Description and Synthesis of Systolic Arrays*. Technical Report TRITA-NA-8318, The Royal Institute of Technology, Dept. of Numerical Analysis and Computing Sciences, 1983.Google Scholar - [92]Boris Lubachevsky and Debasis Mitra.
*A Chaotic, Asynchronous Algorithm for Computing the Fixed Point of a Nonnegative Matrix of Unit Spectral Radius*. Technical Report, AT&T Bell Laboratories, 1984.Google Scholar - [93]Christoffer Lutz, Steve Rabin, Charles L. Seitz, and Donald Speck. Design of the mosaic element. In
*Proceedings, Conf. on Advanced research in VLSI*, pages 1–10, Artech House, 1984.Google Scholar - [94]Carver A. Mead and Lynn Conway.
*Introduction to VLSI Systems*. Addis on-Wesley, 1980.Google Scholar - [95]Willard L. Miranker. Hierarchical relaxation.
*Computing*, 23:267–285, 1979.MathSciNetMATHCrossRefGoogle Scholar - [96]Willard L. Miranker and Andrew Winkler. Spacetime representations of computational structures.
*Computing*, 32(2):93–114, 1984.MathSciNetMATHCrossRefGoogle Scholar - [97]Donald I. Moldovan. On the design of algorithms for vlsi systolic arrays.
*Proc. IEEE*, 71(1):113–120, 1983.Google Scholar - [98]D. Nassimi and Sartaj Sahni. Bitonic sort on a mesh-connected parallel computer.
*IEEE Trans. Computers*, C-27(1):2 – 7, 1979.MathSciNetCrossRefGoogle Scholar - [99]Susan T. O’Donnel, P. Geiger, and Martin H. Schultz.
*Solving the Poisson Equation on the FPS-164*. Technical Report YALEU/DCS/RR-293, Research Center for Scientific Computing, Dept. of Computer Science, Yale University, November 1983.Google Scholar - [100]M.S. Paterson, W.L. Ruzzo, and Larry Snyder. Bounds on minimax edge length for complete binary trees. In
*Proc. of the 13th Annual Symposium on the Theory of Computing*, pages 293–299, ACM, 1981.Google Scholar - [101]Gregory F. Pfister, W.C. Brantley, D.A. George, S.L. Harvey, W.J. Kleinfelder, K.P. McAuliffe, E.A. Melton, V.A. Norton, and J. Weiss. The ibm research parallel processor prototype (rp3); introduction and architecture. In
*Proceedings of the 1985 International Conference on Parallel Processing*, pages 764–771, IEEE Computer Society, 1985.Google Scholar - [102]Franco P. Preparata and J.E. Vuillemin. The cube connected cycles: a versatile network for parallel computation. In
*Proc. Twentieth Annual IEEE Symposium on Foundations of Computer Science*, pages 140–147, 1979.Google Scholar - [103]P. Quinton. Automatic synthesis of systolic arrays from uniform recurrent equations. In
*Proc. 11th Annual Symposium on Computer Architecture*, pages 208–214, IEEE Computer Society, 1984.Google Scholar - [104]L.R. Rabiner and B. Gold.
*Theory and Application of Digital Signal Processing*. Prentice-Hall, 1975.Google Scholar - [105]Abhiram Ranade. Interconnection networks and parallel memory organization for array processing. In
*1985 International Conference on Parallel Processing*, IEEE Computer Society, 1985.Google Scholar - [106]Abhiram Ranade and S. Lennart Johnsson. The communication efficiency of meshes, boolean cubes, and cube connected cycles for wafer scale integration. In
*Int. Conf. on Parallel Processing*, page, IEEE Computer Society, 1987.Google Scholar - [107]E M. Reingold, J Nievergelt, and N Deo.
*Combinatorial Algorithms*. Prentice Hall, 1977.Google Scholar - [108]E. Reiter and Gary Rodrigue. An incomplete cholesky factorization by a matrix partitioning algorithm. In
*Elliptic Problem Solvers II*, pages 161–174, Academic Press, 1983.Google Scholar - [109]Arnold L. Rosenberg and Larry Snyder. Bounds on the costs of data encodings.
*Mathematical Systems Theory*, 12:9–39, 1978.MathSciNetMATHCrossRefGoogle Scholar - [110]John Van Rosendale. Minimizing inner product dependencies in conjugate gradient iteration. In
*Proc. of the 1983 International Conference on Parallel Processing*, pages 44–46, IEEE Computer Society, 1983.Google Scholar - [111]Yousef Saad.
*Practical use of Polynomial Preconditionings for the Conjugate Gradient Method*. Technical Report YALEU/DCS/RR-282, Dept. of Computer Science, 1983.Google Scholar - [112]Yousef Saad and Ahmed H. Sameh. Iterative methods for the solution of elliptic differential equations on multiprocessors. In
*Proc. of the CONPAR 81 Conference*, pages 395–411, Springer Verlag, 1981.CrossRefGoogle Scholar - [113]Faisal Saied, Ching-Tien Ho, S. Lennart Johnsson, and Martin H. Schultz. Solving schroedinger’s equation on the intel ipsc by the alternating direction method. In
*Hypercube Multiprocessors 1987*, page, SIAM, September 1986. Tech. report YALEU/CSD/RR-502.Google Scholar - [114]Ahmed H. Sameh. A fast poisson solver for multiprocessors. In
*Elliptic Problem Solvers II*, pages 175–186, Academic Press, 1984.Google Scholar - [115]Ahmed H. Sameh. Numerical parallel algorithms — a survey. In
*High Speed Computer and Algorithm Organization*, pages 207–228, Academic Press, 1977.Google Scholar - [116]Ahmed H. Sameh and David J. Kuck. On stable parallel linear system solvers.
*J. ACM*, 25(1):81–91, January 1978.MathSciNetMATHCrossRefGoogle Scholar - [117]J.T. Schwartz. Ultracomputers.
*ACM Trans. on Programming Languages and Systems*, 2:484–521, 1980.MATHCrossRefGoogle Scholar - [118]Charles L. Seitz. Ensemble architectures for vlsi — a survey and taxonomy. In P. Penfield Jr., editor,
*1982 Conf on Advanced Research in VLSI*, pages 130 – 135, Artech House, January 1982.Google Scholar - [119]Charles L. Seitz. Experiments with vlsi ensemble machines.
*J. VLSI Comput. Syst*., 1(3), 1984.Google Scholar - [120]M.C. Sejnowski, E.T. Upchurch, R.N. Kapur, D.P.S. Charlu, and G.J. Lipovski. An overview of the texas reconfigurable array computer. In
*Proceedings, National Computer Conference*, pages 631–641, IEEE, 1980.Google Scholar - [121]M. Sekanina. On an ordering of the set of vertices of a connected graph.
*Publ. of the Faculty of the Sciences of the Univ. of Brno*, 412():137–142, 1960.Google Scholar - [122]J. Stevens.
*A Fast Fourier Transform Subroutine for the Illiac IV*. Technical Report, Center for Advanced Computation, Univ. of Illinois, 1971.Google Scholar - [123]Harold S. Stone. Parallel processing with the perfect shuffle.
*IEEE Trans. Computers*, C-20:153–161, 1971.CrossRefGoogle Scholar - [124]Paul N. Swartztrauber. The methods of cyclic reduction, fourier analysis, and the facr algorithm for the discrete solution of poisson’s equation on a rectangle.
*SIAM Review*, 19:490–501, 1977.MathSciNetMATHCrossRefGoogle Scholar - [125]Clive Temperton. Direct methods for the solution of the discrete poisson equation: some comparisons.
*J. of Computational Physics*, 31:1–20, 1979.MathSciNetMATHCrossRefGoogle Scholar - [126]Clive Temperton. On the facr(l) algorithm for the discrete poisson equation.
*J. of Computational Physics*, 34:314–329, 1980.MathSciNetMATHCrossRefGoogle Scholar - [127]C.D. Thompson.
*Fourier Transforms in VLSI*. Technical Report UCB/ERL/M80/51, Electronic Research Laboratory, UC Berkeley, 1980.Google Scholar - [128]C.D. Thompson and H.T. Kung. Sorting on a mesh-connected parallel computer.
*CACM*, 20(4):263–271, 1977.MathSciNetMATHGoogle Scholar - [129]Jeffrey D. Ullman.
*Computational Aspects of VLSI*. Computer Sciences Press, 1984.MATHGoogle Scholar - [130]E. Upfal. Efficient schemes for parallel computation. In
*ACM Symposium on Principles of Distributed Computing*, pages 55–59, ACM, 1982.Google Scholar - [131]Leslie Valiant and G.J. Brebner. Universal schemes for parallel communication. In
*Proc. of the 13th ACM Symposium on the Theory of Computation*, pages 263–277, ACM, 1981.Google Scholar - [132]A.Y. Wu. Embedding of tree networks in hypercubes.
*Journal of Parallel and Distributed Computing*, 2(3):238–249, 1985.CrossRefGoogle Scholar - [133]W.A. Wulf and C.G. Bell. C.mmp — a multi-mini-processor. In
*AFIPS 72 FJCC*, pages 765–777, 1972.Google Scholar