Abstract
Writing a parallel shared memory application that achieves good performance and scales well as the number of threads increases can be challenging. One of the reasons is that as threads proliferate, the contention among shared resources increases and this may cause performance degradation. In particular, multi-threaded applications can suffer from the false sharing problem, which can degrade the performance of an application significantly. The work in this paper focuses on detecting performance bottlenecks caused by false sharing in OpenMP applications. We introduce a dynamic framework to help application developers detect instances of false sharing as well as identify the data objects in an OpenMP code that cause the problem. The framework that we have developed leverages features of the OpenMP collector API to interact with the OpenMP compiler’s runtime library and utilizes the information from hardware counters. We demonstrate the usefulness of this framework on actual applications that exhibit poor scaling because of false sharing. To show the benefit of our technique, we manually modify the identified problem code by adjusting the alignment of the data that are causing false sharing; we then compare the performance with the original version.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bayer, R., McCreight, E.: Organization and Maintenance of Large Ordered Indices. Mathematical and Information Sciences Report No. 20 (1970)
Chapman, B., Jost, G., Pas, R.V.D.: Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press (2008)
Chow, J.-H., Sarkar, V.: False Sharing Elimination by Selection of Runtime Scheduling Parameters. In: Proceedings of the ICPP (1997)
Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. IEEE Computational Science and Engineering (1998)
Günther, S.M., Weidendorfer, J.: Assessing Cache False Sharing Effects by Dynamic Binary Instrumentation. In: Proceedings of the WBIA (2009)
Hernandez, O., Chapman, B., et al.: Open Source Software Support for the OpenMP Runtime API for Profiling. In: P2S2 (2009)
Intel. Avoiding and Identifying False Sharing Among Threads (2010)
Jeremiassen, T.E., Eggers, S.J.: Reducing False Sharing on Shared Memory Multiprocessors Through Compile Time Data Transformations. SIGPLAN (1995)
Kim, J., Hsu, W.-C., Yew, P.-C.: COBRA: An Adaptive Runtime Binary Optimization Framework for Multithreaded Applications. In: ICPP (2007)
Liao, C., Hernandez, O., Chapman, B., Chen, W., Zheng, W.: OpenUH: An Optimizing, Portable OpenMP Compiler. In: CPC (2006)
Liu, T., Berger, E.: Sheriff: Detecting and Eliminating False Sharing. Technical report, University of Massachusetts, Amherst (2010)
Liu, X., Mellor-Crummey, J.: Pinpointing Data Locality Problems Using Data-centric Analysis. In: CGO (2011)
Marathe, J., Mueller, F.: Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks. IEEE Trans. Parallel Distrib. Syst. (June 2007)
Marathe, J., Mueller, F., de Supinski, B.R.: Analysis of Cache-Coherence Bottlenecks with Hybrid Hardware/Software Techniques. ACM TACO (2006)
Martonosi, M., Gupta, A., Anderson, T.: MemSpy: Analyzing Memory System Bottlenecks in Programs. SIGMETRICS Perform. Eval. Rev. 20, 1–12 (1992)
McCurdy, C., Vetter, J.: Memphis: Finding and Fixing Numa-Related Performance Problems on Multi-Core Platforms. In: ISPASS (2010)
Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating MapReduce for Multi-core and Multiprocessor Systems. In: HPCA (2007)
Shende, S.S., Malony, A.D.: The TAU Parallel Performance System. Int. J. High Perform. Comput. Appl. (2006)
Torrellas, J., Lam, H.S., Hennessy, J.L.: False Sharing and Spatial Locality in Multiprocessor Caches. IEEE Trans. Comput. 43, 651–663 (1994)
University of Oregon. ParaProf User’s Manual
van der Pas, R.: Getting OpenMP Up To Speed (2010)
Vogelsang, R.: SGA Altix Tuning OpenMP Parallelized Applications (2005)
Wicaksono, B., Nanjegowda, R.C., Chapman, B.: A Dynamic Optimization Framework for OpenMP. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 54–68. Springer, Heidelberg (2011)
Zhao, Q., Koh, D., Raza, S., Bruening, D., Wong, W.-F., Amarasinghe, S.: Dynamic Cache Contention Detection in Multi-threaded Applications. In: VEE (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wicaksono, B., Tolubaeva, M., Chapman, B. (2013). Detecting False Sharing in OpenMP Applications Using the DARWIN Framework. In: Rajopadhye, S., Mills Strout, M. (eds) Languages and Compilers for Parallel Computing. LCPC 2011. Lecture Notes in Computer Science, vol 7146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36036-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-36036-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36035-0
Online ISBN: 978-3-642-36036-7
eBook Packages: Computer ScienceComputer Science (R0)