Abstract
CPU-GPU heterogeneous architectures are now commonly used in a wide variety of computing systems from mobile devices to supercomputers. Maximizing the throughput for multi-programmed workloads on such systems is indispensable as one single program typically cannot fully exploit all avaiable resources. At the same time, power consumption is a key issue and often requires optimizing power allocations to the CPU and GPU while enforcing a total power constraint, in particular when the power/thermal requirements are strict. The result is a system-wide optimization problem with several knobs. In particular we focus on (1) co-scheduling decisions, i.e., selecting programs to co-locate in a space sharing manner; (2) resource partitioning on both CPUs and GPUs; and (3) power capping on both CPUs and GPUs. We solve this problem using predictive performance modeling using machine learning in order to coordinately optimize the above knob setups. Our experiential results using a real system show that our approach achieves up to 67% of speedup compared to a time-sharing-based scheduling with a naive power capping that evenly distributes power budgets across components.
Keywords
- Co-scheduling
- Resource partitioning
- Power capping
- CPU-GPU heterogeneous systems
- Machine learning
This is a preview of subscription content, access via your institution.
Buying options











Notes
- 1.
In case no profile is available for a job, which we do not cover in the paper, we can exclude it from the co-scheduling candidates at the first stage in the diagram and execute it exclusively without power capping while obtaining the profile for the future references.
- 2.
References
Top 500 list (2022). https://www.top500.org/lists. Accessed 24 July 2022
perf: Linux profiling with performance counters (2022). https://perf.wiki.kernel.org/index.php/Main_Page. Accessed 24 July 2022
Arima, E., Hanawa, T., Trinitis, C., Schulz, M.: Footprint-aware power capping for hybrid memory based systems. In: Sadayappan, P., Chamberlain, B.L., Juckeland, G., Ltaief, H. (eds.) ISC High Performance 2020. LNCS, vol. 12151, pp. 347–369. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-50743-5_18
Arima, E., et al.: Optimizing hardware resource partitioning and job allocations on modern gpus under power caps. In: ICPPW (2022)
Azimi, R., et al.: PowerCoord: a coordinated power capping controller for multi-CPU/GPU servers. In: IGSC, pp. 1–9 (2018)
Bailey, P.E., et al.: Adaptive configuration selection for power-constrained heterogeneous systems. In: ICPP, pp. 371–380 (2014)
Barnes, B.J., et al.: A regression-based approach to scalability prediction. In: ICS, pp. 368–377 (2008)
Barrera, S., et al.: Modeling and optimizing NUMA effects and prefetching with machine learning. In: ICS, no. 34 (2020)
Bhadauria, M., et al.: An approach to resource-aware co-scheduling for CMPs. In: ICS, pp. 189–199 (2010)
Cao, T., et al.: Demand-aware power management for power-constrained HPC systems. In: CCGrid, pp. 21–31 (2016)
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IISWC, pp. 44–54 (2009)
Cochran, R., et al.: Pack & cap: adaptive DVFs and thread packing under power caps. In: MICRO, pp. 175–185 (2011)
Cook, W., et al.: Computing minimum-weight perfect matchings. In: INFORMS Journal on Computing, vol. 11 (1999)
Dennard, R.H., et al.: Design of ion-implanted MOSFET’s with very small physical dimensions. IEEE J. Solid-State Circ. 9(5), 256–268 (1974)
Eeckhout, L.: Heterogeneity in response to the power wall. IEEE Micro 35(04), 2–3 (2015)
Greathouse, J.L., et al.: Machine learning for performance and power modeling of heterogeneous systems. In: ICCAD, pp. 1–6 (2018)
ïpek, E., et al.: Efficiently exploring architectural design spaces via predictive modeling. In: ASPLOS, pp. 195–206 (2006)
Lee, B.C., et al.: Accurate and efficient regression modeling for microarchitectural performance and power prediction. In: ASPLOS, pp. 185–194 (2006)
Nagasaka, H., et al.: Statistical power modeling of GPU kernels using performance counters. In: International Conference on Green Computing, pp. 115–122 (2010)
NVIDIA: Nsight compute (2022). https://developer.nvidia.com/nsight-compute. Accessed 24 July 2022
Nvidia: Nvidia multi-instance GPU (2022). https://www.nvidia.com/en-us/technologies/multi-instance-gpu/. Accessed 24 July 2022
Patki, T., et al.: Exploring hardware overprovisioning in power-constrained, high performance computing. In: ICS, pp. 173–182 (2013)
Sarood, O., et al.: Maximizing throughput of overprovisioned HPC data centers under a strict power budget. In: SC, pp. 807–818 (2014)
Sasaki, H., et al.: Coordinated power-performance optimization in manycores. In: PACT, pp. 51–61 (2013)
Zhu, Q., et al.: Co-run scheduling with power cap on integrated CPU-GPU systems. In: IPDPS, pp. 967–977 (2017)
Zhuravlev, S., et al.: Addressing shared resource contention in multicore processors via scheduling. In: ASPLOS, pp. 129–142 (2010)
Acknowledgements
This work has received funding under the European Commission’s EuroHPC and H2020 programmes under grant agreement no. 956560 and was supported by the NVIDIA Academic Hardware Grant Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Saba, I., Arima, E., Liu, D., Schulz, M. (2022). Orchestrated Co-scheduling, Resource Partitioning, and Power Capping on CPU-GPU Heterogeneous Systems via Machine Learning. In: Schulz, M., Trinitis, C., Papadopoulou, N., Pionteck, T. (eds) Architecture of Computing Systems. ARCS 2022. Lecture Notes in Computer Science, vol 13642. Springer, Cham. https://doi.org/10.1007/978-3-031-21867-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-21867-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21866-8
Online ISBN: 978-3-031-21867-5
eBook Packages: Computer ScienceComputer Science (R0)