Abstract
For many applications of scientific computing, reduction operations may cause a performance bottleneck. In this article, the performance of different coarse- and fine-grained methods for implementing the reduction is investigated. Fine-grained reductions using atomic operations or fine-grained explicit locks are compared to the coarse-grained reduction operations supplied by OpenMP and MPI.
The reduction operations investigated are used for an adaptive FEM. The performance results show that applications can gain a speedup by using fine-grained reduction since this implementation enables to hide the reduction between calculation while minimising the time waiting for synchronisation.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Beuchler, S., Meyer, A., Pester, M.: SPC-Pm3AdH v1.0 Programmers manual. Preprint SFB393 01-08, TU Chemnitz (2001) (revised 2003)
Basic linear algebra subprograms technical (BLAST) forum standard (2001)
Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A portable programming interface for performance evaluation on modern processors. Int. J. High Perform. Comput. Appl. 14(3), 189–204 (2000)
Case, R., Padegs, A.: Architecture of the IBM System/370. Commun. ACM 21(1), 73–96 (1987)
Gao, D., Schwartzentruber, T.: Optimizations and OpenMP implementation for the direct simulation monte carlo method. Comput. Fluids 42(1), 73–81 (2011)
Greenwald, M.: Non-blocking synchronization and system design. Ph.D. thesis, Stanford University, Stanford, CA, USA (1999)
Liu, Z., Chapman, B.M., Wen, Y., Huang, L., Hernandez, O.: Analyses for the Translation of OpenMP Codes into SPMD Style with Array Privatization. In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 26–41. Springer, Heidelberg (2003)
Meloni, S., Federico, A., Rosati, M.: Reduction on arrays: comparison of performances between different algorithms. In: Proc. EWOMP 2003 (2003)
Meyer, A.: A parallel preconditioned conjugate gradient method using domain decomposition and inexact solvers on each subdomain. Comput. 45, 217–234 (1990)
Ries, D., Stonebraker, M.: Effects of locking granularity in a database management system. ACM Trans. Database Syst. 2(3), 233–246 (1977)
Shirako, J., Peixotto, D., Sarkar, V., Scherer, W.: Phaser accumulators: A new reduction construct for dynamic parallelism. In: Proc. IPDPS (2009)
Speziale, E., di Biagio, A., Agosta, G.: An optimized reduction design to minimize atomic operations in shared memory multiprocessors. In: Proc. IPDPS, Workshops and PhD Forum (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Balg, M., Lang, J., Meyer, A., Rünger, G. (2013). Array-Based Reduction Operations for a Parallel Adaptive FEM. In: Keller, R., Kramer, D., Weiss, JP. (eds) Facing the Multicore-Challenge III. Lecture Notes in Computer Science, vol 7686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35893-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-35893-7_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35892-0
Online ISBN: 978-3-642-35893-7
eBook Packages: Computer ScienceComputer Science (R0)