Lock coarsening: Eliminating lock overhead in automatically parallelized object-based programs
Atomic operations are a key primitive in parallel computing systems. The standard implementation mechanism for atomic operations uses mutual exclusion locks. In an object-based programming system the natural granularity is to give each object its own lock. Each operation can then make its execution atomic by acquiring and releasing the lock for the object that it accesses. But this fine lock granularity may have high synchronization overhead. To achieve good performance it may be necessary to reduce the overhead by coarsening the granularity at which the computation locks objects.
In this paper we describe a static analysis technique — lock coarsening — designed to automatically increase the lock granularity in object-based programs with atomic operations. We have implemented this technique in the context of a parallelizing compiler for irregular, object-based programs. Experiments show these algorithms to be effective in reducing the lock overhead to negligible levels.
Unable to display preview. Download preview PDF.
- 1.J. Barnes and P. Hut. A hierarchical O(NlogN) force-calculation algorithm. Nature, pages 446–449, December 1976.Google Scholar
- 2.P. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987.Google Scholar
- 3.R. Cytron. Doacross: Beyond vectorization for multiprocessors. In Proceedings of the 1986 International Conference on Parallel Processing, St. Charles, IL, August 1986.Google Scholar
- 4.U. Herrmann, P. Dadam, K. Kuspert, E. Roman, and G Schlageter. A lock technique for disjoint and non-disjoint complex objects. In Proceedings of the International Conference on Extending Database Technology (EDBT'90), pages 219–235, Venice, Italy, March 1990.Google Scholar
- 5.G. Kane and J. Heinrich. MIPS Risc Architecture. Prentice-Hall, 1992.Google Scholar
- 6.D. Lenoski. The Design and Analysis of DASH: A Scalable Directory-Based Multiprocessor. PhD thesis, Stanford, CA, February 1992.Google Scholar
- 7.B-H. Lim and A. Agarwal. Reactive synchronization algorithms for multiprocessors. In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, October 1994.Google Scholar
- 8.S. Midkiff and D. Padua. Compiler algorithms for synchronization. IEEE Transactions on Computers, 36(12):1485–1495, December 1987.Google Scholar
- 9.M. Rinard and P. Diniz. Commutativity analysis: A new analysis framework for parallelizing compilers. In Proceedings of the SIGPLAN '96 Conference on Program Language Design and Implementation, Philadelphia, PA, May 1996. (http://www.cs.ucsb.edu/∼martin/pldi96.ps).Google Scholar
- 10.C. Tseng. Compiler optimizations for eliminating barrier synchronization. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 144–155, Santa Barbara, CA, July 1995.Google Scholar
- 11.S. Woo, M. Ohara, E. Torrie, J.P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In Proceedings of the 22th International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 1995.Google Scholar