Hierarchical Scheduling of DAG Structured Computations on Manycore Processors with Dynamic Thread Grouping

Xia, Yinglong; Prasanna, Viktor K.; Li, James

doi:10.1007/978-3-642-16505-4_9

Yinglong Xia¹⁸,
Viktor K. Prasanna^18,19 &
James Li¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6253))

Included in the following conference series:

Workshop on Job Scheduling Strategies for Parallel Processing

700 Accesses
4 Citations

Abstract

Many computational solutions can be expressed as directed acyclic graphs (DAGs) with weighted nodes. In parallel computing, scheduling such DAGs onto manycore processors remains a fundamental challenge, since synchronization across dozens of threads and preserving precedence constraints can dramatically degrade the performance. In order to improve scheduling performance on manycore processors, we propose a hierarchical scheduling method with dynamic thread grouping, which schedules DAG structured computations at three different levels. At the top level, a supermanager separates threads into groups, each consisting of a manager thread and several worker threads. The supermanager dynamically merges and partitions the groups to adapt the scheduler to the input task dependency graphs. Through group merging and partitioning, the proposed scheduler can dynamically adjust to become a centralized scheduler, a distributed scheduler or somewhere in between, depending on the input graph. At the group level, managers collaboratively schedule tasks for their workers. At the within-group level, workers perform self-scheduling within their respective groups and execute tasks. We evaluate the proposed scheduler on the Sun UltraSPARC T2 (Niagara 2) platform that supports up to 64 hardware threads. With respect to various input task dependency graphs, the proposed scheduler exhibits superior performance when compared with other various baseline methods, including typical centralized and distributed schedulers.

This research was partially supported by the National Science Foundation under grant number CNS-0613376. NSF equipment grant CNS-0454407 is gratefully acknowledged.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahmad, I., Ranka, S., Khan, S.: Using game theory for scheduling tasks on multi-core processors for simultaneous optimization of performance and energy. In: Intl. Sym. on Parallel Dist. Proc., pp. 1–6 (2008)
Google Scholar
Kwok, Y.K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys 31(4), 406–471 (1999)
Article Google Scholar
Zhu, W., Thulasiraman, P., Thulasiram, R.K., Gao, G.R.: Exploring financial applications on many-core-on-a-chip architecture: A first experiment. In: Frontiers of High Performance Computing and Networking, pp. 221–230 (2006)
Google Scholar
Sheahan, D.: Developing and tuning applications on UltraSPARC T1 chip multithreading systems. Technical report (2007)
Google Scholar
Tan, G., Sreedhar, V.C., Gao, G.R.: Analysis and performance results of computing betwenness centrality on ibm cyclops64. Journal of Supercomputing (2009)
Google Scholar
Ahmad, I., Kwok, Y.K., Wu, M.Y.: Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors. In: Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks, pp. 207–213 (1996)
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability; A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York (1990)
MATH Google Scholar
Papadimitriou, C., Yannakakis, M.: Towards an architecture-independent analysis of parallel algorithms. In: Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, pp. 510–513 (1988)
Google Scholar
Benoit, A., Hakem, M., Robert, Y.: Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems. Parallel Computing 35(2), 83–108 (2009)
Article Google Scholar
Song, F., YarKhan, A., Dongarra, J.: Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems. In: International Conference for Hight Performance Computing, Networking Storage and Analysis (2009)
Google Scholar
Coffman, E.G.: Computer and Job-Shop Scheduling Theory. John Wiley and Sons, New York (1976)
MATH Google Scholar
Karamcheti, V., Chien, A.: A hierarchical load-balancing framework for dynamic multithreaded computations. In: Proceedings of the ACM/IEEE Conference on Supercomputing, pp. 1–17 (1998)
Google Scholar
Zhao, H., Sakellariou, R.: Scheduling multiple DAGs onto heterogeneous systems. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12 (2006)
Google Scholar
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An efficient multithreaded runtime system. Technical report, Cambridge (1996)
Google Scholar
Intel Threading Building Blocks, http://www.threadingbuldingblocks.org/
OpenMP Application Programming Interface, http://www.openmp.org/
Charm++ programming system, http://charm.cs.uiuc.edu/research/charm/
Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: Mpi microtask for programming the cell broadband enginetm processor. IBM Systems Journal 45(1), 85–102 (2006)
Article Google Scholar
Kurzak, J., Dongarra, J.: Fully dynamic scheduler for numerical computing on multicore processors. Technical report (2009)
Google Scholar
Xia, Y., Feng, X., Prasanna, V.K.: Parallel evidence propagation on multicore processors. In: The 10th International Conference on Parallel Computing Technologies, pp. 377–391 (2009)
Google Scholar
Bader, D.: High-performance algorithm engineering for large-scale graph problems and computational biology. In: 4th International Workshop on Efficient and Experimental Algorithms, pp. 16–21 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Southern California, Los Angeles, CA, 90089, U.S.A.
Yinglong Xia & Viktor K. Prasanna
Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA, 90089, U.S.A.
Viktor K. Prasanna & James Li

Authors

Yinglong Xia
View author publications
You can also search for this author in PubMed Google Scholar
Viktor K. Prasanna
View author publications
You can also search for this author in PubMed Google Scholar
James Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Facebook, 475 Brannan St., 94107, San Francisco, CA, USA
Eitan Frachtenberg
Robotics Research Institute, Section Information Technology, TU Dortmund University, Otto-Hahn-Str. 8, 44227, Dortmund, Germany
Uwe Schwiegelshohn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xia, Y., Prasanna, V.K., Li, J. (2010). Hierarchical Scheduling of DAG Structured Computations on Manycore Processors with Dynamic Thread Grouping. In: Frachtenberg, E., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2010. Lecture Notes in Computer Science, vol 6253. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16505-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-16505-4_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16504-7
Online ISBN: 978-3-642-16505-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics