CSTallocator: Call-Site Tracing Based Shared Memory Allocator for False Sharing Reduction in Page-Based DSM Systems
False sharing is a result of co-location of unrelated data in the same unit of memory coherency, and is one source of unnecessary overhead being of no help to keep the memory coherency in multiprocessor systems. Moreover, the damage caused by false sharing becomes large in proportion to the granularity of memory coherency. To reduce false sharing in page-based DSM systems, it is necessary to allocate unrelated data objects that have different access patterns into the separate shared pages. In this paper we propose call-site tracing-based shared memory allocator, shortly CSTallocator. CSTallocator expects that the data objects requested from the different call-sites may have different access patterns in the future. So CSTallocator places each data object requested from the different call-sites into the separate shared pages, and consequently data objects that have the same call-site are likely to get together into the same shared pages. We use execution-driven simulation of real parallel applications to evaluate the effectiveness of our CSTallocator. Our observations show that our CSTallocator outperforms the existing dynamic shared memory allocator.
KeywordsFalse Sharing Distributed Shared Memory Dynamic Memory Allocation Call Site Tracing
Unable to display preview. Download preview PDF.
- 1.Tanenbaum, A.S.: Distributed Operating Systems, ch. 6, pp. 333–345. Prentice Hall, Englewood Cliffs (1995)Google Scholar
- 2.Lee, J., Cho, Y.: Page Replication Mechanism using Adjustable DELAY Counter in NUMA Multiprocessors. Journal of the Korean Institute of Telematics and Electronics B 33B(6), 23–33 (1996)Google Scholar
- 3.Torrellas, J., Lam, M.S., Hennessy, J.L.: Shared Data Placement Optimizations to Reduce Multiprocessor Cache Miss Rates. In: Proceedings of the 1990 International Conference on Parallel Processing, vol. II(Software), pp. 266–270 (August 1990)Google Scholar
- 4.Eggers, S.J., Jeremiassen, T.E.: Eliminating False Sharing. In: Proceedings of the 1991 International Conference on Parallel Processing, vol. I (Architecture), pp. 377–381 (August 1991)Google Scholar
- 5.Lee, J., Cho, Y.: Shared Memory Allocation Mechanism for Reducing False Sharing in Non-Uniform Memory Access Multiprocessors. Journal of Korean Information Science Society(A): Computer Systems and Theory 23(5), 487–497 (1996)Google Scholar
- 6.Lee, J.W., Cho, Y.: An Effective Shared Memory Allocator for Reducing False Sharing in NUMA Multiprocessors. In: Proceedings of 1996 IEEE 2nd International Conference on Algorithms & Architectures for Parallel Processing(ICA3PP 1996), pp. 373–382 (June 1996)Google Scholar
- 7.Adema, R.L., Ellis, C.S.: Memory Allocation Constructs to Complement NUMA Memory Management. In: Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing (December 1991)Google Scholar
- 8.Lee, J., Kim, M., Han, J., Ji, D., Yoon, J., Kim, J.: Effects of Dynamic Shared Memory Allocation Techniques on False Sharing in DSM Systems. Journal of Korean Information Science Society(A): Computer Systems and Theory 24(12), 1257–1269 (1997)Google Scholar
- 9.Han, B., Cho, S., Cho, Y.: Techniques for Eliminating False Sharing and Reducing Communication Traffic in Distributed Shared Memory Systems. Journal of Korean Information Science Society(A) 25(10), 1100–1108 (1998)Google Scholar
- 10.Veenstra, J.E.: MINT Tutorial and User Manual. Technical Report TR452, Computer Science Department, University of Rochester (July 1993)Google Scholar
- 11.Veenstra, J.E., Fowler, R.J.: MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors. In: Proceedings of the Second International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 1994), pp. 201–207 (January-February 1994)Google Scholar
- 13.Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH2 Programs: Characterization and Methodological Considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, pp. 24–36 (June 1995)Google Scholar