Euro-Par 2013: Euro-Par 2013 Parallel Processing pp 647-658 | Cite as
Model and Complexity Results for Tree Traversals on Hybrid Platforms
Abstract
We study the complexity of traversing tree-shaped workflows whose tasks require large I/O files. We target a heterogeneous architecture with two resources of different types, each equipped with its own memory, such as a multicore node equipped with a dedicated accelerator (FPGA or GPU). Tasks in the workflow are tagged with the type of resource that is needed for their processing. Besides, a task can be processed on a given resource only if all its input files and output files can be stored in the corresponding memory. At a given execution step, the amount of data stored in each memory strongly depends upon the ordering in which the tasks are executed, and upon when communications between both memories are scheduled. The objective is to determine an efficient traversal that minimizes the maximum amount of memory of each type needed to traverse the whole tree. In this paper, we establish the complexity of this two-memory scheduling problem, provide inapproximability results, and show how to determine the optimal depth-first traversal. Altogether, these results lay the foundations for memory-aware scheduling algorithms on heterogeneous platforms.
Keywords
Task Graph Memory Type Blue Node Communication Node Tree TraversalPreview
Unable to display preview. Download preview PDF.
References
- 1.Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: Starpu: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23(2), 187–198 (2011)CrossRefGoogle Scholar
- 2.Herrmann, J., Marchal, L., Robert, Y.: Tree traversals with task-memory affinities. Research report 8226, INRIA (2013)Google Scholar
- 3.Horton, M., Tomov, S., Dongarra, J.: A class of hybrid lapack algorithms for multicore and gpu architectures. In: 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC), pp. 150–158 (July 2011)Google Scholar
- 4.Jacquelin, M., Marchal, L., Robert, Y., Ucar, B.: On optimal tree traversals for sparse matrix factorization. In: IPDPS 2011 (2011)Google Scholar
- 5.Liu, J.W.H.: On the storage requirement in the out-of-core multifrontal method for sparse factorization. ACM Trans. Math. Software 12(3), 249–264 (1986)MathSciNetMATHCrossRefGoogle Scholar
- 6.Liu, J.W.H.: An application of generalized tree pebbling to sparse matrix factorization. SIAM J. Algebraic Discrete Methods 8(3) (1987)Google Scholar
- 7.Marchal, L., Sinnen, O., Vivien, F.: Scheduling tree-shaped task graphs to minimize memory and makespan. Research report 8082, INRIA (2012); Accepted for publication in IPDPS 2013Google Scholar
- 8.Ramakrishnan, A., Singh, G., Zhao, H., Deelman, E., Sakellariou, R., Vahi, K., Blackburn, K., Meyers, D., Samidi, M.: Scheduling data-intensiveworkflows onto storage-constrained distributed resources. In: CCGRID 2007. IEEE (2007)Google Scholar
- 9.Sethi, R.: Complete register allocation problems. In: STOC 1973, pp. 182–195. ACM Press (1973)Google Scholar
- 10.Sethi, R., Ullman, J.: The generation of optimal code for arithmetic expressions. J. ACM 17(4), 715–728 (1970)MathSciNetMATHCrossRefGoogle Scholar
- 11.Tomov, S., Nath, R., Du, P., Dongarra, J.: MAGMA version User’s guide (2009), http://icl.eecs.utk.edu/magma/