Revisiting the Cache Miss Analysis of Multithreaded Algorithms

Cole, Richard; Ramachandran, Vijaya

doi:10.1007/978-3-642-29344-3_15

Richard Cole¹⁷ &
Vijaya Ramachandran¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7256))

Included in the following conference series:

Latin American Symposium on Theoretical Informatics

910 Accesses
5 Citations

Abstract

This paper revisits the cache miss analysis of algorithms when scheduled using randomized work stealing (RWS) in a parallel environment where processors have private caches. We focus on the effect of task migration on cache miss costs, and in particular, the costs of accessing “hidden” data typically stored on execution stacks (such as the return location for a recursive call).

Prior analyses, with the exception of [1], do not account for such costs, and it is not clear how to extend them to account for these costs. By means of a new analysis, we show that for a variety of basic algorithms these task migration costs are no larger than the costs for the remainder of the computation, and thereby recover existing bounds. We also analyze a number of algorithms implicitly analyzed by [1], namely Scans (including Prefix Sums and Matrix Transposition), Matrix Multiply (the depth n in-place algorithm, the standard 8-way divide and conquer algorithm, and Strassen’s algorithm), I-GEP, finding a longest common subsequence, FFT, the SPMS sorting algorithm, list ranking and graph connected components; we obtain sharper bounds in many cases.

While this paper focusses on the RWS scheduler, the bounds we obtain are a function of the number of steals, and thus would apply to any scheduler given bounds on the number of steals it induces.

Richard Cole (cole@cs.nyu.edu) was supported in part by NSF Grant CCF- 0830516. Vijaya Ramachandran (vlr@cs.utexas.edu) was supported in part by NSF Grant CCF-0830737.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Acar, U.A., Blelloch, G.E., Blumofe, R.D.: The data locality of work stealing. Theory of Computing Systems 35(3), 321–347 (2002)
Article MathSciNet MATH Google Scholar
Blumofe, R., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. JACM, 720–748 (1999)
Google Scholar
Blumofe, R.D., Joerg, C.F., Kuzmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. SIGPLAN Not. 30, 207–216 (1995)
Article Google Scholar
Burton, F.W., Sleep, M.R.: Executing functional programs on a virtual tree of processors. In: Proc. ACM Conference on Functional Programming Languages and Computer Architecture, pp. 187–194 (1981)
Google Scholar
Chowdhury, R., Ramachandran, V.: Cache-oblivious dynamic programming. In: Proc. of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2006, pp. 591–600 (2006)
Google Scholar
Chowdhury, R., Ramachandran, V.: The cache-oblivious Gaussian Elimination Paradigm: Theoretical framework, parallelization and experimental evaluation. Theory of Comput. Syst. 47(1), 878–919 (2010)
Article MathSciNet MATH Google Scholar
Chowdhury, R.A., Ramachandran, V.: Cache-efficient dynamic programming algorithms for multicores. In: Proc. of the Twentieth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2008, pp. 207–216 (2008)
Google Scholar
Chowdhury, R.A., Silvestri, F., Blakeley, B., Ramachandran, V.: Oblivious algorithms for multicores and network of processors. In: Proc. 2010 IEEE International Symposium on Parallel & Distributed Processing, IPDPS 2010, pp. 1–12 (2010)
Google Scholar
Cole, R., Ramachandran, V.: Resource Oblivious Sorting on Multicores. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6198, pp. 226–237. Springer, Heidelberg (2010)
Chapter Google Scholar
Cole, R., Ramachandran, V.: Analysis of randomized work stealing with false sharing. CoRR, abs/1103.4142 (2011)
Google Scholar
Cole, R., Ramachandran, V.: Efficient resource oblivious algorithms for multicores with false sharing. In: Proc. IEEE IPDPS (to appear, 2012)
Google Scholar
Cormen, T., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press (2009)
Google Scholar
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. Fortieth Annual Symposium on Foundations of Computer Science, FOCS 1999, pp. 285–297 (1999)
Google Scholar
Frigo, M., Strumpen, V.: The cache complexity of multithreaded cache oblivious algorithms. Theory Comput. Syst. 45, 203–233 (2009)
Article MathSciNet MATH Google Scholar
Gautier, T., Besseron, X., Pigeon, L.: Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In: Proc. International Workshop on Parallel Symbolic Computation, PASCO 2007, pp. 15–23 (2007)
Google Scholar
Halstead, R.H.J.: Implementation of Multilistp: Lisp on a multiprocessor. In: Proc. ACM Symposium on LISP and Functional Programming, pp. 9–17 (1984)
Google Scholar
Robison, A., Voss, M., Kukanov, A.: Optimization via reflection on work stealing in tbb. In: Proc. IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, pp. 1–8 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Dept., Courant Institute, NYU, New York, NY, 10012, USA
Richard Cole
Dept. of Computer Science, University of Texas, Austin, TX, 78712, USA
Vijaya Ramachandran

Authors

Richard Cole
View author publications
You can also search for this author in PubMed Google Scholar
Vijaya Ramachandran
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Iowa State University, 50011, Ames, IA, USA
David Fernández-Baca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cole, R., Ramachandran, V. (2012). Revisiting the Cache Miss Analysis of Multithreaded Algorithms. In: Fernández-Baca, D. (eds) LATIN 2012: Theoretical Informatics. LATIN 2012. Lecture Notes in Computer Science, vol 7256. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29344-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-29344-3_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29343-6
Online ISBN: 978-3-642-29344-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics