A scalable implementation of barrier synchronization using an adaptive combining tree
- 182 Downloads
- 23 Citations
Abstract
Barrier synchronization is commonly used for synchronizing processors prior to a join operation and to enforce data dependencies during the execution of parallelized loops. Simple software implementations of barrier synchronization can result in memory hot-spots, especially in large scale shared-memory multiprocessors containing hundreds of processors and memory modules communicating through an interconnection network. A software combining tree can be used to substantially reduce memory contention due to hot-spots. However, such an implementation results inO(logn) latency in recognition of barrier synchronization, wheren is the number of processors. In this paper anadaptive software combining tree is used to implement a scalable barrier withO(1) recognition latency. The processors that arrive early at the barrier adapt the combining tree so that it has a structure appropriate for reducing the latency for the processors that arrive later. We also show how adaptive combining trees can be used to implement the fuzzy barrier. The fuzzy barrier mechanism reduces the idling of processors at the barriers by allowing the processors to execute useful instructions while they are waiting at the barrier.
Key Words
Memory hot spots software combining tree fuzzy barrier interconnection networks processor synchronizationPreview
Unable to display preview. Download preview PDF.
References
- 1.E. D. Brooks, The Butterfly Barrier,International Journal of Parallel Programming,15(4):295–307 (August 1986).Google Scholar
- 2.D. Hansgen, R. Finkel, and U. Manber, Two Algorithms for Barrier Synchronization,International Journal of Parallel Programming,17(1):1–18 (February 1988).Google Scholar
- 3.P. C. Yew, N. F. Tzeng, and D. H. Lawrie, Distributing Hot-Spot Addressing in Large Scale Multiprocessors,IEEE Transactions on Computers,C-36(4):388–395 (April 1987).Google Scholar
- 4.D. H. Lawrie, Access and Alignment of Data in an Array Processor,IEEE Transactions on Computers,C-24:1145–1155 (December 1975).Google Scholar
- 5.D. J. Kuck, E. S. Davidson, D. H. Lawrie, and A. H. Sameh, Parallel Supercomputing Today and the Cedar Approach,Science,231:967–974 (February 1986).Google Scholar
- 6.A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, The NYU Ultracomputer-Designing a MIMD Shared Memory Parallel Machine,IEEE Transactions on Computers,C-32(2):175–189 (February 1983).Google Scholar
- 7.G. F. Pfister, The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture, InProc. of the International Conf. on Parallel Processing, pp. 764–771 (August 1985).Google Scholar
- 8.R. Gupta, The Fuzzy Barrier: A Mechanism for High Speed Synchronization of Processors, InProc. of the Third International Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 54–64 (April 1989).Google Scholar