Synchronization issues in data-parallel languages
Data-parallel programming has established itself as the preferred way of programming a large class of scientific applications. In this paper, we address the issue of reducing synchronization costs when implementing a data-parallel language on an asynchronous architecture. The synchronization issue is addressed from two perspectives: first, we describe language constructs that allow the programmer to specify that different parts of a data-parallel program be synchronized at different levels of granularity. Secondly, we show how existing tools and algorithms for data dependency analysis can be used by the compiler to both reduce the number of barriers and to replace global barriers by cheaper clustered synchronizations. Although the techniques presented in the paper are general purpose, we describe them in the context of a data-parallel language called UC developed at UCLA. Reducing the number of barriers improves program execution time by reducing synchronization time and also processor stall times.
Unable to display preview. Download preview PDF.
- [ABCD93]V. Austel, R. Bagrodia, M. Chandy, and M. Dhagat. Reductions + Relations = Data-Parallelism Technical report, University of California, Los Angeles, CA 90024, April 1993.Google Scholar
- [BA92]R. Bagrodia and V. Austel. UC User Manual. Computer Science Department, University of California at Los Angeles, 1992.Google Scholar
- [For92]High Performance Fortran Forum. High Performance Fortran Language Specification. DRAFT, November 1992.Google Scholar
- [FOW87]J. Ferrante, K. Ottenstein, and J.D. Warren. The program dependence graph and its use in optimization. ACM TOPLAS, 9(3), July 1987.Google Scholar
- [Gup89]Rajiv Gupta. The fuzzy barrier: A mechanism for high-speed synchronization of processors. In Proceedings of the 3rd International Conference on Architectural Support for Programming Languages and Operating Systems, pages 54–64, April 1989.Google Scholar
- [HQL+91]P. Hatcher, M. Quinn, A. Lapadula, B. Seevers, R. Anderson, and R. Jones. Data-parallel programming on MIMD computers. IEEE Trans. on Parallel and Distributed Systems, July 1991.Google Scholar
- [MA87]S. K. Midkiff and Padua D. A. Compiler Algorithms for Synchronization. IEEE Transactions on Computers, C-36(12):1485–1495, December 1987.Google Scholar
- [MR90]P. Mehrotra and J. Van Rosendale. Programming distributed memory architectures using Kali. Report 90-69, Institute for Computer Application in Science and Engineering, Hampton, VA, 1990.Google Scholar
- [QHS91]M. Quinn, P. Hatcher, and B. Seevers. Implementing a Data Parallel Language on a Tightly Coupled Multiprocessor. In A. Nicolau, D. Gelernter, Gross T., and Padua D., editors, Advances in Languages and Compilers for Parallel Processing, chapter 20, pages 385–401. MIT Press, Cambridge, Massachusetts, 1991.Google Scholar
- [RS87]J.R. Rose and G.L. Steele. C*: An Extended C Language for Data Parallel Programming. PL-87.5, Thinking Machines Corporation, March 1987.Google Scholar
- [SSS88]C. L. Seitz, J. Seizovic, and W. K. Su. The C Programmer's Abbreviated Guide to Multicomputer Programming. Technical Report Caltech-CS-TR-88-1, Caltech, January 1988.Google Scholar
- [Wol89]Michael Wolfe. Optimizing Supercompilers for Supercomputers. MIT Press, 1989.Google Scholar