Shared memory, vectors, message passing, and scalability
The many zealots of parallel computation are split into several factions, each believing that it has the best (or sometimes, only) possible architectural idea for the fast computers of the future. Comparisons between the ideas have been rare, possibly because the zealots don't like to acknowledge the merits of each other's architectures. This paper compares notions from two popular types of MIMD [Flyn72] parallel computers: the message passing systems with many processors like those manufactured by Ncube and Intel, and the traditional shared memory systems of a few vector processors, Cray's products for example. Messages are compared with vectors, sending and receiving a message is compared with loading and storing a vector, and the idea of using parallelism to keep a waiting processor busy is compared with the idea of using parallelism to keep a vector pipeline busy. The conclusions are that a good shared memory system and a good message passing system are indistinguishable, and that parallelism is used in both kinds of systems not just to translate increased hardware into increased speed but also to achieve good hardware utilization in spite of nonlocality in space or time.
KeywordsShared Memory Cache Line Memory Latency Virtual Processor Message Delay
Unable to display preview. Download preview PDF.
- [Flyn72]M. J. Flynn “Some Computer Organizations and their Effectiveness”, IEEE Transactions on Computers 21, pp. 948–960 (September 1972).Google Scholar
- [LuFa87]O. M. Lubeck and V. Faber Modeling the Performance of Hypercubes: a Case Study Using the Particle-in-cell Application”, Los Alamos Computer Research and Applications Group Report LA-UR87-1522 (1987).Google Scholar
- [GKLS83]D. Gajski, D. Kuck, D. Lawrie, and A. Sameh “CEDAR-A Large Scale Multiprocessor”, Proc. 1983 International Conference on Parallel Processing, pp. 524–529.Google Scholar
- [PBGH85]G. Pfister, W. Brantley, D. George, S. Harvey, W. Kleinfelder, K. McAuliffe, E. Melton, V. Norton, and J. Weiss “The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture”, Proc. 1985 International Conference on Parallel Processing, pp. 764–771.Google Scholar
- [LeYL87]R. L. Lee, P-C. Yew, and D. H. Lawrie “Multiprocessor Cache Design Considerations”, Proc. 14th Annual International Symposium on Computer Architecture pp. 253–262 (1987).Google Scholar
- [ArCu86]Arvind and D. E. Culler “Dataflow Architectures”, Annual Review of Computer Science 1986 1, pp. 225–253.Google Scholar
- [Smit78]B. J. Smith “A Pipelined, Shared Resource MIMD Computer”, Proc. 1978 International Conference on Parallel Processing, pp. 6–8.Google Scholar