Predictive Runtime Code Scheduling for Heterogeneous Architectures

  • Víctor J. Jiménez
  • Lluís Vilanova
  • Isaac Gelado
  • Marisa Gil
  • Grigori Fursin
  • Nacho Navarro
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5409)


Heterogeneous architectures are currently widespread. With the advent of easy-to-program general purpose GPUs, virtually every recent desktop computer is a heterogeneous system. Combining the CPU and the GPU brings great amounts of processing power. However, such architectures are often used in a restricted way for domain-specific applications like scientific applications and games, and they tend to be used by a single application at a time. We envision future heterogeneous computing systems where all their heterogeneous resources are continuously utilized by different applications with versioned critical parts to be able to better adapt their behavior and improve execution time, power consumption, response time and other constraints at runtime. Under such a model, adaptive scheduling becomes a critical component.

In this paper, we propose a novel predictive user-level scheduler based on past performance history for heterogeneous systems. We developed several scheduling policies and present the study of their impact on system performance. We demonstrate that such scheduler allows multiple applications to fully utilize all available processing resources in CPU/GPU-like systems and consistently achieve speedups ranging from 30% to 40% compared to just using the GPU in a single application mode.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
    CUDA Programming Guide 1.1. NVIDIA’s website (2007)Google Scholar
  3. 3.
    Badia, R.M., Labarta, J., Sirvent, R., Pérez, J.M., Cela, J.M., Grima, R.: Programming grid applications with grid superscalar. J. Grid Comput. 1(2), 151–170 (2003)CrossRefGoogle Scholar
  4. 4.
    Bellens, P., Perez, J.M., Badia, R.M., Labarta, J.: Cellss: a programming model for the cell be architecture. In: SC 2006: Proceedings of the 2006 ACM/IEEE conference on Supercomputing. ACM, New York (2006)Google Scholar
  5. 5.
    Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proceedings of the IEEE 93(2), 216–231 (2005); special issue on Program Generation, Optimization, and Platform AdaptationCrossRefGoogle Scholar
  6. 6.
    Fursin, G., Cohen, A., O’Boyle, M., Temam, O.: A practical method for quickly evaluating program optimizations. In: Conte, T., Navarro, N., Hwu, W.-m.W., Valero, M., Ungerer, T. (eds.) HiPEAC 2005. LNCS, vol. 3793, pp. 29–46. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Fursin, G., Miranda, C., Pop, S., Cohen, A., Temam, O.: Practical run-time adaptation with procedure cloning to enable continuous collective compilation. In: Proceedings of the GCC Developers Summit (July 2007)Google Scholar
  8. 8.
    Gabb, H.A., Jackson, R.M., Sternberg, M.J.: Modelling protein docking using shape complementarity, electrostatics and biochemical information. Journal of Molecular Biology 272(1), 106–120 (1997)CrossRefGoogle Scholar
  9. 9.
    Gelado, I., Kelm, J.H., Ryoo, S., Lumetta, S.S., Navarro, N., Hwu, W.m.W.: Cuba: an architecture for efficient cpu/co-processor data communication. In: ICS 2008: Proceedings of the 22nd annual international conference on Supercomputing, pp. 299–308. ACM, New York (2008)CrossRefGoogle Scholar
  10. 10.
    Mackay, D.J.C.: Information Theory, Inference & Learning Algorithms. Cambridge University Press, Cambridge (2002)Google Scholar
  11. 11.
    Maheswaran, M., Siegel, H.J.: A dynamic matching and scheduling algorithm for heterogeneous computing systems. In: HCW 1998: Proceedings of the Seventh Heterogeneous Computing Workshop, Washington, DC, USA, p. 57. IEEE Computer Society, Los Alamitos (1998)CrossRefGoogle Scholar
  12. 12.
    Oh, H., Ha, S.: A static scheduling heuristic for heterogeneous processors. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 573–577. Springer, Heidelberg (1996)CrossRefGoogle Scholar
  13. 13.
    Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.m.W.: Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 73–82. ACM, New York (2008)CrossRefGoogle Scholar
  14. 14.
    Sih, G.C., Lee, E.A.: A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures. IEEE Trans. Parallel Distrib. Syst. 4(2), 175–187 (1993)CrossRefGoogle Scholar
  15. 15.
    Stone, H.S.: Multiprocessor scheduling with the aid of network flow algorithms. IEEE Transactions on Software Engineering SE-3(1), 85–93 (1977)CrossRefGoogle Scholar
  16. 16.
    Stratton, J., Stone, S., Hwu, W.m.: Mcuda: An efficient implementation of cuda kernels on multi-cores. Technical Report IMPACT-08-01, University of Illinois at Urbana-Champaign (March 2008)Google Scholar
  17. 17.
    Tanenbaum, A.S.: Modern Operating Systems. Prentice Hall PTR, Upper Saddle River (2001)Google Scholar
  18. 18.
    Tanenbaum, A.S., van Steen, M.: Distributed Systems: Principles and Paradigms, 2nd edn. Prentice-Hall, Inc., Upper Saddle River (2006)Google Scholar
  19. 19.
    Topcuoglu, H., Hariri, S., Wu, M.-Y.: Task scheduling algorithms for heterogeneous processors. In: Heterogeneous Computing Workshop, 1999 (HCW 1999) Proceedings. Eighth, pp. 3–14 (1999)Google Scholar
  20. 20.
    Topcuoglu, H., Hariri, S., Wu, M.-Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems 13(3), 260–274 (2002)CrossRefGoogle Scholar
  21. 21.
    Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27(1–2), 3–35 (2001)MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Víctor J. Jiménez
    • 1
  • Lluís Vilanova
    • 2
  • Isaac Gelado
    • 2
  • Marisa Gil
    • 2
  • Grigori Fursin
    • 3
  • Nacho Navarro
    • 2
  1. 1.Barcelona Supercomputing Center (BSC) 
  2. 2.Departament d’Arquitectura de Computadors (UPC) 
  3. 3.ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud University 

Personalised recommendations