Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Globally scheduled real-time multiprocessor systems with GPUs

  • 523 Accesses

  • 33 Citations


Graphics processing units, GPUs, are powerful processors that can offer significant performance advantages over traditional CPUs. The last decade has seen rapid advancement in GPU computational power and generality. Recent technologies make it possible to use GPUs as co-processors to CPUs. The performance advantages of GPUs can be great, often outperforming traditional CPUs by orders of magnitude. While the motivations for developing systems with GPUs are clear, little research in the real-time systems field has been done to integrate GPUs into real-time multiprocessor systems. We present two real-time analysis methods, addressing real-world platform constraints, for such an integration into a soft real-time multiprocessor system and show that a GPU can be exploited to achieve greater levels of total system performance.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19


  1. 1.

    Notable platforms include the Compute Unified Device Architecture (CUDA) from NVIDIA (CUDA Zone, URL, Stream from AMD/ATI (ATI Stream Technology, URL, OpenCL from Apple and the Khronos Group (OpenCL. URL, and DirectCompute from Microsoft (Microsoft DirectX, URL

  2. 2.

    China’s new nebulae supercomputer is no. 2, URL

  3. 3.

    Parallel computing with SciFinance, URL

  4. 4.

    GeForce graphics processors, URL

  5. 5.

    Intel microprocessor export compliance metrics, URL

  6. 6.

    CUDA community showcase, URL

  7. 7.

    AMD Fusion Family of APUs, URL

  8. 8.

    Intel details 2011 processor features, offers stunning visuals build-in, URL

  9. 9.

    The sample NVIDIA CUDA SDK programs were modified to use pinned memory, which prevents these memory segments from being potentially paged to disk. The use of pinned memory can significantly reduce communication overheads as the system can take advantage of direct memory access (DMA) data transfers. For example, the communication-to-execution ratio for the eigenvalue program increases to about 30% without it.

  10. 10.

    NVIDIA’s Fermi architecture allows limited simultaneous execution of kernels as long as these kernels are sourced from the same host-side context/thread. In this work, we will not consider such uses.

  11. 11.

    The GTX-295 actually provides two independent GPUs on a single card, though only one GPU was used in this work.

  12. 12.

    Some have recently speculated that the earliest-deadline-zero-laxity (EDZL) algorithm may be better suited to accounting for self-suspensions (caused, for example, by using a GPU) (Lakshmanan et al. 2010), though actionable results have yet to be presented, so better suspension accounting remains an open problem.

  13. 13.

    For performance, GPU operations may be performed asynchronously by the GPU-using job. This allows several GPU operations to be batched together and treated as a single operation, reducing the number of times the job must suspend to wait for GPU results. No changes to our task model are necessary to support this type of operation.

  14. 14.

    A window-constrained scheduling algorithm prioritizes a job by a time point contained within an interval window that also contains the job’s release and deadline.

  15. 15.

    Common workload profiles were solicited from research groups at UNC that frequently make use of CUDA. A poll was also informally taken at the NVIDIA CUDA online forums. Similar timing characteristics were later confirmed in the domain of computer vision for real-time automotive applications (Muyan-Ozcelik et al. 2011).

  16. 16.

    Please note that some graphs appear to be missing data points at lower and upper system utilization ranges. This is caused by the occasional inability to generate task sets meeting particular scenario constraints. This was usually due to the inability to generate a task set with at least two GPU-using tasks under the given constraints.

  17. 17.

    Graphs for all scenarios are available at

  18. 18.

    k-exclusion locks protect a resource or resource pool, allowing up to k simultaneous accesses.


  1. Abhijeet G, Muni TI (2009) GPU based sparse grid technique for solving multidimensional options pricing PDEs. In: Proceedings of the 2nd workshop on high performance computational finance, pp 1–9

  2. Aila T, Laine S (2009) Understanding the efficiency of ray traversal on GPUs. In: Proceedings of the conference on high performance graphics, pp 145–149

  3. Baruah S (2000) Scheduling periodic tasks on uniform processors. In: Proceedings of the EuroMicro conference on real-time systems, pp 7–14

  4. Baruah S (2004) Feasibility analysis of preemptive real-time systems upon heterogeneous multiprocessor platforms. In: Proceedings of the 25th IEEE real-time systems symposium, pp 37–46

  5. Block A, Leontyev H, Brandenburg B, Anderson J (2007) A flexible real-time locking protocol for multiprocessors. In: Proceedings of the 13th IEEE international conference on embedded and real-time computing systems and applications, pp 47–57

  6. Brandenburg B, Anderson J (2010) Optimality results for multiprocessor real-time locking. In: Proceedings of the 31st IEEE real-time systems symposium, pp 49–60

  7. Calandrino J, Leontyev H, Block A, Devi U, Anderson J (2006) LITMUSRT: A testbed for empirically comparing real-time multiprocessor schedulers. In: Proceedings of the 27th IEEE real-time systems symposium, pp 111–123

  8. Childs S, Ingram D (2001) The Linux-SRT integrated multimedia operating system: bringing QoS to the desktop. In: Proceedings of the 7th real-time technology and applications symposium, p 135

  9. Devi U, Anderson J (2008) Tardiness bounds under global EDF scheduling on a multiprocessor. In: Real-time systems, vol 38, pp 133–189

  10. Dwarakinath A (2008) A fair-share scheduler for the graphics processing unit. Master’s thesis, Stony Brook University

  11. Erickson J, Devi U, Baruah S (2010) Improved tardiness bounds for global EDF. In: Proceedings of the 22nd EuroMicro conference on real-time systems, pp 14–23

  12. Funk S, Goossens J, Baruah S (2001) On-line scheduling on uniform multiprocessors. In: Proceedings of the 22nd IEEE real-time systems symposium, pp 183–202

  13. Gai P, Abeni L, Buttazzo G (2002) Multiprocessor DSP scheduling in system-on-a-chip architectures. In: Proceedings of the 14th EuroMicro conference on real-time systems, pp 231–238

  14. Harrison O, Waldron J (2008) Practical symmetric key cryptography on modern graphics hardware. In: Proceedings of the 17th conference on security symposium, pp 195–209

  15. Kang W, Son SH, Stankovic JA, Amirijoo M (2007) I/O-aware deadline miss ratio management in real-time embedded databases. In: Proceedings of the 28th IEEE real-time systems symposium, pp 277–287

  16. Kato S, Ishikawa Y (2009) Gang EDF scheduling of parallel task systems. In: Proceedings of the 30th IEEE real-time systems symposium, pp 459–468

  17. Kato S, Lakshmanan K, Rajkumar R, Ishikawa Y (2011a) Resource sharing in GPU-accelerated windowing systems. In: Proceedings of the 17th IEEE real-time and embedded technology and application symposium

  18. Kato S, Lakshmanan K, Rajkumar R, Ishikawa Y (2011b) TimeGraph: GPU scheduling for real-time multi-tasking environments. In: Proceedings of the USENIX annual technical conference

  19. Lakshmanan K, Kato S, Rajkumar R (2010) Open problems in scheduling self-suspending tasks. In: Proceedings of the 1st international real-time scheduling open problems seminar, pp 12–13

  20. Leontyev H, Anderson J (2009) A hierarchical multiprocessor bandwidth reservation scheme with timing guarantees. Real-Time Syst 43(1):60–92

  21. Lipari RPG (2007) Holistic analysis of asynchronous real-time transactions with earliest deadline scheduling. J Comput Syst Sci 73:186–206

  22. Manica N, Abeni L, Palopoli L (2008) QoS support in the ×11 window system. In: Proceedings of the 14th IEEE real-time and embedded technology and applications symposium, pp 103–112

  23. Muyan-Ozcelik P, Glavtchev V, Ota JM, Owens JD (2011) Real-time speed-limit-sign recognition an embedded system using a GPU. In: GPU Computing Gems, pp 473–496

  24. Ong CY, Weldon M, Quiring S, Maxwell L, Hughes M, Whelan C, Okoniewski M (2010) Speed it up. IEEE Microw Mag 11(2):70–78

  25. Pieters B, Hollemeersch CF, Lambert P, de Walle RV (2009) Motion estimation for H.264/AVC on multiple GPUs using NVIDIA CUDA. In: Applications of digital image processing XXII, vol 7443, p 74430X

  26. Raravi G, Andersson B (2010) Calculating an upper bound on the finishing time of a group of threads executing on a GPU: a preliminary case study. In: Work-in-progress session of the 16th IEEE international conference on embedded and real-time computing systems and applications, pp 5–8

  27. Sasinowski JE, Strosnider JK (1995) ARTIFACT: a platform for evaluating real-time window system designs. In: Proceedings of the 16th IEEE real-time systems symposium, pp 342–352

  28. Watanabe Y, Itagaki T (2009) Real-time display on Fourier domain optical coherence tomography system using a graphics processing unit. J Biomed Opt 14, 060506

Download references


Work supported by NSF grants CNS 0834270, CNS 0834132, and CNS 1016954; ARO grant W911NF-09-0535; AFOSR grant FA9550-09-1-0549; and AFRL grant FA8750-11-1-0033.

Author information

Correspondence to Glenn A. Elliott.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Elliott, G.A., Anderson, J.H. Globally scheduled real-time multiprocessor systems with GPUs. Real-Time Syst 48, 34–74 (2012).

Download citation


  • Real-time systems
  • Global scheduling
  • Semaphore protocols
  • Bandwidth servers