StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
In the field of HPC, the current hardware trend is to design multiprocessor architectures that feature heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE SPUs) or data-parallel accelerators (e.g. GPGPUs).
Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge.
We have thus designed StarPU, an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. The main goal of StarPU is to provide numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand.
We have developed several strategies that can be selected seamlessly at run time, and we have demonstrated their efficiency by analyzing the impact of those scheduling policies on several classical linear algebra algorithms that take advantage of multiple cores and GPUs at the same time. In addition to substantial improvements regarding execution times, we obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine.
Unable to display preview. Download preview PDF.
- 1.Augonnet, C., Namyst, R.: A unified runtime system for heterogeneous multicore architectures. In: Euro-Par 2008 Workshops - Parallel Processing, Las Palmas de Gran Canaria, Spain (August 2008)Google Scholar
- 3.Barrachina, S., Castillo, M., Igual, F.D., Mayo, R., Quintana-Ort, E.S.: Solving Dense Linear Systems on Graphics Processors. Technical report, Universidad Jaime I, Spain (February 2008)Google Scholar
- 4.Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: Cellss: a programming model for the cell be architecture. In: SC 2006: Proceedings of the 2006 ACM/IEEE conference on Supercomputing, p. 86. ACM, New York (2006)Google Scholar
- 5.Buttari, A., Langou, J., Kurzak, J., Dongarra, J.: A class of parallel tiled linear algebra algorithms for multicore architectures (2007)Google Scholar
- 6.Crawford, C.H., Henning, P., Kistler, M., Wright, C.: Accelerating computing with the cell broadband engine processor. In: CF 2008 (2008)Google Scholar
- 7.Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A hybrid multi-core parallel programming environment (2007)Google Scholar
- 8.Duran, A., Perez, J.M., Ayguade, E., Badia, R., Labarta, J.: Extending the openmp tasking model to allow dependant tasks. In: IWOMP Proceedings (2008)Google Scholar
- 9.Jiménez, V.J., Vilanova, L., Gelado, I., Gil, M., Fursin, G., Navarro, N.: Predictive runtime code scheduling for heterogeneous architectures. In: HiPEAC, pp. 19–33 (2009)Google Scholar
- 10.Kunzman, D.: Charm++ on the Cell Processor. Master’s thesis, Dept. of Computer Science, University of Illinois (2006)Google Scholar
- 11.McCool, M.D.: Data-parallel programming on the cell be and the gpu using the rapidmind development platform. In: GSPx Multicore Applications Conference (2006)Google Scholar
- 12.Nijhuis, M., Bos, H., Bal, H.E., Augonnet, C.: Mapping and synchronizing streaming applications on cell processors. In: HiPEAC, pp. 216–230 (2009)Google Scholar
- 13.Ohara, M., Inoue, H., Sohda, Y., Komatsu, H., Nakatani, T.: Mpi microtask for programming the cell broadband enginetm processor. IBM Syst. J. 45(1) (2006)Google Scholar
- 15.Ramet, P., Roman, J.: Pastix: A parallel sparse direct solver based on a static scheduling for mixed 1d/2d block distributions. In: Proceedings of Irregular’2000, Cancun, Mexique, pp. 519–525. Springer, Heidelberg (2000)Google Scholar
- 16.Wesolowski, L.: An application programming interface for general purpose graphics processing units in an asynchronous runtime system. Master’s thesis, Dept. of Computer Science, University of Illinois (2008)Google Scholar
- 17.Whaley, R.C., Dongarra, J.: Automatically Tuned Linear Algebra Software. In: Ninth SIAM Conference on Parallel Processing for Scientific Computing (1999)Google Scholar