Real-World Clustering for Task Graphs on Shared Memory Systems

Herz, Alexander; Pinkau, Chris

doi:10.1007/978-3-319-15789-4_2

Alexander Herz¹⁵ &
Chris Pinkau¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8828))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

593 Accesses

Abstract

Due to the increasing desire for safe and (semi-)automated parallelization of software, the scheduling of automatically generated task graphs becomes increasingly important. Previous static scheduling algorithms assume negligible run-time overhead of spawning and joining tasks. We show that this overhead is significant for small- to medium-sized tasks which can often be found in automatically generated task graphs and in existing parallel applications.

By comparing real-world execution times of a schedule to the predicted static schedule lengths we show that the static schedule lengths are uncorrelated to the measured execution times and underestimate the execution times of task graphs by factors up to a thousand if the task graph contains small tasks. The static schedules are realistic only in the limiting case when all tasks are vastly larger than the scheduling overhead. Thus, for non-large tasks the real-world speedup achieved with these algorithms may be arbitrarily bad, maybe using many cores to realize a speedup even smaller than one, irrespective of any theoretical guarantees given for these algorithms. This is especially harmful on battery driven devices that would shut down unused cores.

We derive a model to predict parallel task execution times on symmetric schedulers, i.e. where the run-time scheduling overhead is homogeneous. The soundness of the model is verified by comparing static and real-world overhead of different run-time schedulers. Finally, we present the first clustering algorithm which guarantees a real-world speedup by clustering all parallel tasks in the task graph that cannot be efficiently executed in parallel. Our algorithm considers both, the specific target hardware and scheduler implementation and is cubic in the size of the task graph.

Our results are confirmed by applying our algorithm to a large set of randomly generated benchmark task graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Intel(R) Core(TM) i7 CPU 860 @ 2.80GHz SMP x86_64 GNU/Linux 3.5.0-37-generic.
2.
Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz SMP x86_64 GNU/Linux 3.5.0-17-generic.

References

Adve, V.S., Vernon, M.K.: The influence of random delays on parallel execution times. SIGMETRICS Perfom. Eval. Rev. 21(1), 61–73 (1993)
Article Google Scholar
Adve, V.S., Vernon, M.K.: Parallel program performance prediction using deterministic task graph analysis. ACM Trans. Comput. Syst. 22(1), 94–136 (2004)
Article Google Scholar
Bender, M.A., Farach-Colton, M., Pemmasani, G., Skiena, S., Sumazin, P.: Lowest common ancestors in trees and directed acyclic graphs. J. Algorithms 57(2), 75–94 (2005)
Article MATH MathSciNet Google Scholar
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999)
Article MATH MathSciNet Google Scholar
Coffman Jr., E.G., Garey, M.R., Johnson, D.S.: An application of bin-packing to multiprocessor scheduling. SIAM J. Comput. 7(1), 1–17 (1978)
Article MATH MathSciNet Google Scholar
Darte, A., Robert, Y.P., Vivien, F.: Scheduling and Automatic Parallelization. Birkhäuser Boston (2000)
Google Scholar
Dick, R.P., Rhodes, D.L., Wolf, W.: Tgff: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign, pp. 97–101. IEEE Computer Society (1998)
Google Scholar
Gerasoulis, A., Yang, T.: On the granularity and clustering of directed acyclic task graphs. IEEE Trans. Parallel Distrib. Syst. 4(6), 686–701 (1993)
Article Google Scholar
Girkar, M., Polychronopoulos, C.D.: Automatic extraction of functional parallelism from ordinary programs. IEEE Trans. Parallel Distrib. Syst. 3(2), 166–178 (1992)
Article Google Scholar
Girkar, M., Polychronopoulos, C.D.: The hierarchical task graph as a universal intermediate representation. Int. J. Parallel Prog. 22(5), 519–551 (1994)
Article Google Scholar
Graham, R.L.: Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17(2), 416–429 (1969)
Article MATH MathSciNet Google Scholar
Intel. Thread building blocks 4.1 (2013). http://www.threadingbuildingblocks.org/
Khan, A.A., McCreary, C.L., Gong, Y.: A Numerical Comparative Analysis of Partitioning Heuristics for Scheduling Tak Graphs on Multiprocessors. Auburn University, Auburn (1993)
Google Scholar
Kwok, Y.-K., Ahmad, I.: Benchmarking the Task Graph Scheduling Algorithms, pp. 531–537 (1998)
Google Scholar
Liou, J.-C., Palis, M.A.: An efficient task clustering heuristic for scheduling dags on multiprocessors. In: Workshop on Resource Management, Symposium on Parallel and Distributed Processing, pp. 152–156. Citeseer (1996)
Google Scholar
Liou, J.-C., Palis, M.A.: A Comparison of General Approaches to Multiprocessor Scheduling, pp. 152–156. IEEE Computer Society, Washington, DC (1997)
Google Scholar
Liu, Z.: Worst-case analysis of scheduling heuristics of parallel systems. Parallel Comput. 24(5–6), 863–891 (1998)
Article MATH MathSciNet Google Scholar
McCreary, C., Gill, H.: Automatic determination of grain size for efficient parallel processing. Commun. ACM 32(9), 1073–1078 (1989)
Article Google Scholar
McCreary, C.L., Khan, A., Thompson, J., McArdle, M.: A comparison of heuristics for scheduling dags on multiprocessors. In: Proceedings on the Eighth International Parallel Processing Symposium, pp. 446–451. IEEE Computer Society (1994)
Google Scholar
Shin, D., Kim, J.: Power-aware Scheduling of Conditional Task Graphs in Real-time Multiprocessor Systems, pp. 408–413. ACM, New York (2003)
Google Scholar
Indiana University. Open mpi 1(4), 5 (2013). http://www.open-mpi.org/
Yang, T., Gerasoulis, A.: Dsc: Scheduling parallel tasks on an unbounded number of processors. IEEE Trans. Parallel Distrib. Syst. 5(9), 951–967 (1994)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl Für Informatik II/XIV, Technische Universität München, Boltzmannstraße 3, 85748, Garching b. München, Germany
Alexander Herz & Chris Pinkau

Authors

Alexander Herz
View author publications
You can also search for this author in PubMed Google Scholar
Chris Pinkau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Herz .

Editor information

Editors and Affiliations

Google, Mountain View, California, USA
Walfredo Cirne
Ericsson, San Jose, California, USA
Narayan Desai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Herz, A., Pinkau, C. (2015). Real-World Clustering for Task Graphs on Shared Memory Systems. In: Cirne, W., Desai, N. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2014. Lecture Notes in Computer Science(), vol 8828. Springer, Cham. https://doi.org/10.1007/978-3-319-15789-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-15789-4_2
Published: 14 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15788-7
Online ISBN: 978-3-319-15789-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics