Latency and bandwidth requirements of massively parallel programs: FFT as a case study
In this paper we compare three routing algorithms for massively parallel architectures, each offering an increasing degree of adaptivity: a deterministic algorithm, a minimal adaptive based on Duato's methodology and a non-minimal adaptive, the Chaos routing. Rather than using a synthetic benchmark, the comparison is done with a real application, the transpose FFT algorithm. The simulation results collected on bi-dimensional tori with up to 256 processing nodes show that both adaptive algorithms suffer from post-saturation problems that degrade the network throughput.
Unable to display preview. Download preview PDF.
- 1.Kevin Bolding. Chaotic Routing: Design and Implementation of an Adaptive Multicomputer Network Router. PhD thesis, University of Washington, Department of Computer Science and Engineering, Seattle, WA, July 1993.Google Scholar
- 2.Andrew A. Chien. A Cost and Speed Model for k-ary n-cube Wormhole Routers. In Hot Inteconnects '93, Palo Alto, California, August 1993.Google Scholar
- 3.William J. Dally et al. The Message-Driven Processor. IEEE Micro, pages 23–39, April 1992.Google Scholar
- 4.William J. Dally and Charles L. Seitz. Deadlock-Free Message Routing in Multiprocessor Interconnection Networks. IEEE Transactions on Computers, C-36(5):547–553, May 1987.Google Scholar
- 5.José Duato. A Necessary and Sufficient Condition for Deadlock-Free Adaptive Routing in Wormhole Networks. IEEE Transactions on Parallel and Distributed Systems, 6(10): 1055–1067, October 1995.Google Scholar
- 6.Anshul Gupta and Vipin Kumar. The Scalability of FFT on Parallel Computers. IEEE Transactions on Parallel and Distributed Systems, 4(8):922–932, August 1993.Google Scholar
- 7.F. Thomson Leighton. Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann Publishers, San Mateo, CA, USA, 1992.Google Scholar