Improving Perfect Parallelism

Karlsson, Lars; Kjelgaard Mikkelsen, Carl Christian; Kågström, Bo

doi:10.1007/978-3-642-55224-3_8

Improving Perfect Parallelism

Lars Karlsson¹⁹,
Carl Christian Kjelgaard Mikkelsen¹⁹ &
Bo Kågström¹⁹

Conference paper
First Online: 01 January 2014

1577 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8384))

Abstract

We reconsider the familiar problem of executing a perfectly parallel workload consisting of \(N\) independent tasks on a parallel computer with \(P \ll N\) processors. We show that there are memory-bound problems for which the runtime can be reduced by the forced parallelization of individual tasks across a small number of cores. Specific examples include solving differential equations, performing sparse matrix–vector multiplications, and sorting integer keys.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The L3 cache on Abisko is a non-inclusive victim cache, hence the addition.
2.
Note that the effective memory bandwidth is a tool used to illustrate the time measurements and does not reflect the memory bandwidth that is actually consumed at the hardware level.

References

Grant, R.E., Afsahi, A.: A comprehensive analysis of OpenMP applications on dual-core Intel Xeon SMPs. In: IPDPS, pp. 1–8 (2007)
Google Scholar
Gustafson, J.L.: Fixed time, tiered memory, and superlinear speedup. In: Proceedings of the Fifth Distributed Memory Computing Conference (DMCC), pp. 1255–1260 (1990)
Google Scholar
Karlsson, L., Kågström, B., Wadbro, E.: Fine-grained bulge-chasing kernels for strongly scalable parallel QR algorithms. Parallel Comput. (2013, accepted)
Google Scholar
Muddukrishna, A., Podobas, A., Brorsson, M., Vlassov, V.: Task scheduling on manycore processors with home caches. In: Caragiannis, J., et al. (eds.) Euro-Par Workshops 2012. LNCS, vol. 7640, pp. 357–367. Springer, Heidelberg (2013)
Chapter Google Scholar
Olivier, S.L., Porterfield, A.K., Wheeler, K.B., Prins, J.F.: Scheduling task parallelism on multi-socket multicore systems. In: Proceedings of ROSS’11, pp. 49–56. ACM, New York (2011)
Google Scholar
Polizzi, E., Sameh, A.H.: A parallel hybrid banded system solver: the SPIKE algorithm. Parallel Comput. 32(2), 177–194 (2006)
Article MathSciNet Google Scholar
Rauber, T., Rünger, G.: M-task-programming for heterogeneous systems and grid environments. In: IPDPS (2005)
Google Scholar
Williams, S.W.: Auto-tuning performance on multicore computers. Ph.D. thesis, EECS Department, University of California, Berkeley (2008)
Google Scholar
Zhuravlev, S., Blagodurov, S., Fedorova, A.: Addressing shared resource contention in multicore processors via scheduling. SIGARCH Comput. Archit. News 38(1), 129–142 (2010)
Article Google Scholar

Download references

Acknowledgements

Financial support by the Swedish Research Council grant VR A0581501 and eSSENCE, a strategic collaborative eScience programme. This research was conducted using the resources of HPC2N and NSC.

Author information

Authors and Affiliations

Department of Computing Science and HPC2N, Umeå University, Umeå, Sweden
Lars Karlsson, Carl Christian Kjelgaard Mikkelsen & Bo Kågström

Authors

Lars Karlsson
View author publications
You can also search for this author in PubMed Google Scholar
Carl Christian Kjelgaard Mikkelsen
View author publications
You can also search for this author in PubMed Google Scholar
Bo Kågström
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lars Karlsson .

Editor information

Editors and Affiliations

Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Tennessee, Department of Computer Science, Knoxville, Tennessee, USA
Jack Dongarra
Institute of Computer and Information Science, Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski
Technical University of Denmark Informatics and Mathematical Modelling, Kongens Lyngby, Denmark
Jerzy Waśniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karlsson, L., Kjelgaard Mikkelsen, C., Kågström, B. (2014). Improving Perfect Parallelism. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2013. Lecture Notes in Computer Science(), vol 8384. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-55224-3_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-55224-3_8
Published: 06 May 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-55223-6
Online ISBN: 978-3-642-55224-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics